This thesis motivates and describes the Generalized Immediate Dominance/Linear Precedence (GIDLP) formalism: a formalism capable of serving as a processing backbone for linearization-based grammars in the Head Driven Phrase Structure Grammar (HPSG) framework. Complementing the work on the formalism, the thesis defines and implements an efficient parsing algorithm for GIDLP grammars.
Representing a prominent tradition within HPSG, linearization-based HPSG assumes that the domain of word order can be larger than the local tree. This supports elegant and general linguistic analyses for (relatively) free word order languages, including the possibility of licensing discontinuous constituents.
For processing with an HPSG grammar, most systems depend on parsing algorithms that make use of a phrase structure backbone -- a part of the grammar that has been set aside and given a distinguished role in the parsing process -- thereby contrasting with those that view parsing as a general constraint solving task, where general methods for logical reasoning are to be applied to the constraints present in an HPSG grammar. Processing backbones support efficient parsing algorithms, but they restrict the class of HPSG theories that can be encoded to those employing a phrase structure backbone, which excludes linearization-HPSG grammars.
The GIDLP formalism solves the dilemma between the desire to encode linguistically general and elegant linearization-HPSG analyses and the need for a processing backbone. GIDLP allows linguists to specify grammars with linear precedence constraints that operate within explicitly declared word order domains extending beyond the local tree as well as immediate dominance rules in which the grammar writer can arrange the right-hand side as to minimize the number of parsing hypotheses that must be explored. The GIDLP parsing algorithm developed in the thesis supports efficient processing by making direct use of linear precedence constraints during parsing.
Official Version (pdf): This is a version of my dissertation that has been formatted according to the rules of the OSU graduate school (e.g. double-spaced, small margins).
Please cite as:
@PhdThesis{daniels:2005,
author = {Michael W. Daniels},
title = {Generalized {ID/LP} Grammar: A Formalism for Parsing
Linearization-Based {HPSG} Grammars},
school = {The Ohio State University},
url = {http://ling.osu.edu/~daniels/thesis.html},
year = 2005
}
Readable Version (pdf): This is a version of my dissertation that has been formatted according to basic principles of typography (or at least as best as I can do).
Please cite as:
@Unpublished{daniels:2005alt,
author = {Michael W. Daniels},
title = {Generalized {ID/LP} Grammar: A Formalism for Parsing
Linearization-Based {HPSG} Grammars},
url = {http://ling.osu.edu/~daniels/thesis.html},
note = {Alternately-formatted version of author's PhD thesis.},
year = 2005
}
If you have an interest in the source code, an extremely alpha version is available.
You need to have already installed the BOOST libraries. I used version 1.32.0. I expect later versions will work just as well, but I don't know about earlier versions.
The project currently links statically against GMP. Working with GMP in Windows is somewhat tricky, since GMP is partially written in assembler, and MSVC does not understand the dialect of assembler used. I've used Brian Gladman's port of GMP and included the resulting library files in the archive, so you might not need to install this on your own.
The included project files are for MSVC 7.1. While it should in theory be possible to use a different compiler to build the parser, I have not tried this.
There is currently no documentation for the GIDLP grammar format. In essence, GIDLP grammars are TDL files (the grammar syntax used in LKB and PET) with additional 'special' paths and types. These paths and types are declared in the file gidlp.tdl, which must be 'include'd by any GIDLP grammar. See the skeleton.tdl file for a minimal GIDLP grammar skeleton, and the other grammars/fsparser/*.tdl files for examples. Chapter 4 of my thesis explains the GIDLP grammar formalism.
The GIDLP parser does not use its own grammar compiler; it accepts the output of the grammar compiler flop, part of PET, which must be installed separately.
The parser executable takes two files as input, a compiled grammar
file and a sentence file (one per line), and produces an XML file as
output. Typical invocation:
You can then open the results.xml file in an XML-aware browser (note
that you will currently need to edit the XML file and change the
hard-coded path in the first line...)
fsparser grammar.grm test.sents > results.xml
The source archive can be downloaded here.
Contents:
| fsparser/ | source for GIDLP parser |
| libpet/ | source for my DLL version of the PET code |
| grammars/fsparser/ | sample grammars and test cases |
| parseview/ | currently contains only the XSLT transform for viewing parse results |
Pre-compiled binaries can be downloaded here. The fsparser.exe file generates only high-level output, while fsp-detail.exe generates more detailed output. (This will eventually be implemented via command-line switches.)
As mentioned above, this is a very rough version of the software; I expect there will be various milestones to overcome before it can be considered portable. Please let me know if you encounter difficulties using the software; I may not be able to address them immediately, but I will keep track.
Instead of the OSU thesis class that's floating around, I developed a skeleton file based on the memoir class (so I could take advantage of that package's features). That file is available.