Thesis Research

Dissertation

Abstract

This thesis motivates and describes the Generalized Immediate Dominance/Linear Precedence (GIDLP) formalism: a formalism capable of serving as a processing backbone for linearization-based grammars in the Head Driven Phrase Structure Grammar (HPSG) framework. Complementing the work on the formalism, the thesis defines and implements an efficient parsing algorithm for GIDLP grammars.

Representing a prominent tradition within HPSG, linearization-based HPSG assumes that the domain of word order can be larger than the local tree. This supports elegant and general linguistic analyses for (relatively) free word order languages, including the possibility of licensing discontinuous constituents.

For processing with an HPSG grammar, most systems depend on parsing algorithms that make use of a phrase structure backbone -- a part of the grammar that has been set aside and given a distinguished role in the parsing process -- thereby contrasting with those that view parsing as a general constraint solving task, where general methods for logical reasoning are to be applied to the constraints present in an HPSG grammar. Processing backbones support efficient parsing algorithms, but they restrict the class of HPSG theories that can be encoded to those employing a phrase structure backbone, which excludes linearization-HPSG grammars.

The GIDLP formalism solves the dilemma between the desire to encode linguistically general and elegant linearization-HPSG analyses and the need for a processing backbone. GIDLP allows linguists to specify grammars with linear precedence constraints that operate within explicitly declared word order domains extending beyond the local tree as well as immediate dominance rules in which the grammar writer can arrange the right-hand side as to minimize the number of parsing hypotheses that must be explored. The GIDLP parsing algorithm developed in the thesis supports efficient processing by making direct use of linear precedence constraints during parsing.

Download

Software

If you have an interest in the source code, an extremely alpha version is available.

Prerequisites for Compiling

You need to have already installed the BOOST libraries. I used version 1.32.0. I expect later versions will work just as well, but I don't know about earlier versions.

The project currently links statically against GMP. Working with GMP in Windows is somewhat tricky, since GMP is partially written in assembler, and MSVC does not understand the dialect of assembler used. I've used Brian Gladman's port of GMP and included the resulting library files in the archive, so you might not need to install this on your own.

The included project files are for MSVC 7.1. While it should in theory be possible to use a different compiler to build the parser, I have not tried this.

GIDLP Grammars

There is currently no documentation for the GIDLP grammar format. In essence, GIDLP grammars are TDL files (the grammar syntax used in LKB and PET) with additional 'special' paths and types. These paths and types are declared in the file gidlp.tdl, which must be 'include'd by any GIDLP grammar. See the skeleton.tdl file for a minimal GIDLP grammar skeleton, and the other grammars/fsparser/*.tdl files for examples. Chapter 4 of my thesis explains the GIDLP grammar formalism.

The GIDLP parser does not use its own grammar compiler; it accepts the output of the grammar compiler flop, part of PET, which must be installed separately.

Usage

The parser executable takes two files as input, a compiled grammar file and a sentence file (one per line), and produces an XML file as output. Typical invocation:

fsparser grammar.grm test.sents > results.xml
You can then open the results.xml file in an XML-aware browser (note that you will currently need to edit the XML file and change the hard-coded path in the first line...)

Obtaining the Source Code

The source archive can be downloaded here.

Contents:
fsparser/source for GIDLP parser
libpet/source for my DLL version of the PET code
grammars/fsparser/sample grammars and test cases
parseview/currently contains only the XSLT transform for viewing parse results

Pre-compiled binaries can be downloaded here. The fsparser.exe file generates only high-level output, while fsp-detail.exe generates more detailed output. (This will eventually be implemented via command-line switches.)

As mentioned above, this is a very rough version of the software; I expect there will be various milestones to overcome before it can be considered portable. Please let me know if you encounter difficulties using the software; I may not be able to address them immediately, but I will keep track.

OSU Thesis Setup

Instead of the OSU thesis class that's floating around, I developed a skeleton file based on the memoir class (so I could take advantage of that package's features). That file is available.


daniels@ling.osu.edu