Information Extraction from Political Texts Regarding the Middle East

In today's information rich world of millions of pages of text and hypertext, one of the most fruitful areas for computational linguistics applications is information extraction. No person has the time or resources to read (and much less, find) all the information available on a given topic. By employing parsing technologies designed to provide semantic representations, we can design programs which identify and summarize important passages of text, and store them in a database for future research. Current research is focused on comparison of problem representations and the image of other parties as described by different speakers.

This project, a collaboration between researchers in linguistics and political science, focuses on extracting information from texts of interviews and speeches made by key political players in the Middle East. Relevant portions of a document are identified using a first phase of analysis which includes part of speech tagging utilizing the Brill tagger , Chunk Parsing, and anaphor resolution. The selected sentences are given a full grammatical and semantic analysis using the ERGO grammar. The lexicons are supplemented with resources such as COMLEX from the Linguistics Data Consortium, as well as technical vocabulary developed for the political domain.