Speaker: Manfred Stede, Applied Computer Linguistics, University of Potsdam

Title: Text Mining in a Nutshell

Time: Wed, Apr 6, 2011, 10am

Place: MPI für Molekulare Pflanzenphysiologie, Room 0.21 in The Box

Roughly ten years ago, the term "Text Mining" slowly became popular for what previously used to be called "Natural Language Understanding". This talk first gives a brief recap of that development, explaining the reasons and consequences of the "statistical turn" in Computational Linguistics. Then, we provide a sketch of the current state of the art in Text Mining: What are the central methods for robustly extracting information from text, to what extent are these methods domain- or genre-dependent, and what quality can be obtained? In particular, we will focus at two specific subtasks in order to show the spectrum from knowledge-lean, "superficial" analysis to "deep" analysis with knowledge-based methods: on the one hand, automatic text summarization, and on the other, the extraction of semantic relationships (roughly: detecting from a sentence or sequence of sentences the "who did what to whom" information).

