Computational Parsing for Languages with Flexible Constituent Ordering

An important challenge for the computational processing of human languages is the large amount of word order variation found within the grammar of many languages. Although there has been significant progress in the linguistic description of flexible ordering phenomena, most computational approaches to parsing sentences are still based on formal grammars that assume a fixed correspondence between the rules that determine grammatical relations and the linear ordering of constituents. This project aims to develop computationally effective parsing strategies for a linguistic theory of linearization in which linear ordering relations are represented at a level that may be distinct from other grammatical and semantic relations. This approach to parsing is primarily motivated by languages with highly flexible ordering, such as German and Japanese, but it is not limited to those languages.

The theory to be modeled is the result of recent NSF-sponsored research at Ohio State (by Pollard, Kasper and Levine) on linearization in the framework of Head-driven Phrase Structure Grammar (HPSG). Parsing algorithms and automatic compilation techniques are being explored for this new approach to linearization. A fragment of German grammar (based on the dissertation of Andreas Kathol) is also being implemented in order to provide a nontrivial test case for the parsing algorithms. The grammar is being developed using the ConTroll system (from Tuebingen), which implements the formalism underlying HPSG.

Publications

Kasper, Robert T., Mike Calcagno, and Paul C. Davis. 1998. Know When to Hold 'Em: Shuffling Deterministically in a Parser for Nonconcatenative Grammars. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL '98). Montreal, August 1998, pp. 663-669. (gzipped postscript file)