I am a Ph. D. student in linguistics. My academic interests include computational and theoretical linguistics, with particular focus on the intersection of logic, language, and information. I am listed on the OSU Linguistics web site. You can also find photos, my software engineering portfolio, and other stuff about me over at coffeeblack.
My theoretical work mostly focuses on developing Convergent Grammar (CVG), a proof-theoretic formalism, with Carl Pollard and others. I also work with Michael White on OpenCCG (an open-source parser and realizer for CCG). My work on OpenCCG includes building grammardoc, an application that generates HTML documentation from OpenCCG’s grammar specifications, integrating the SRI Language Modeling Toolkit for ngram scoring, and developing ccgbankextract, a tool for corpus-based grammar extraction.
I spend my copious free time riding with the OSU Cycling Club and serving as treasurer for the Student Linguistics Association.
The name Pep stands for Pep is an Earley Parser and is an example of direct left recursion. An implementation of Earley's chart-parsing algorithm in Java, Pep includes a thin command-line interface, but is intended to be used as a library. Pep is free software, released under the GNU Lesser General Public License.
The tar bundle above contains Pep's binaries, full source code, generated documentation, and an Ant build file. It also includes several sample grammars for testing and automated JUnit tests.
Pep can parse strings specified by any CFG (including those that contain recursive rules and ε-productions) provided that terminals only occur alone on the right side. Pep's charts use backpointers so that if a grammar allows ambiguity, Pep keeps track of all of the possible parses in a set of iterable parse trees.
Google's recently-released Web 1T 5-gram Corpus contains so much data that many machines with average amounts of memory are unable to even load it. Funnel is a tool for filtering enormous LMs down to a more manageable size based on user-defineable criteria, such as a limited vocabulary.
Custom filters can be specified by implementing a very simple interface with one method. Filters can also be chained in series, so the effects of one can be made to cascade to others. Funnel works with single-file count LMs as well as with the Google multiple-file format.
Presentation of my ESSLLI 2008 paper. 13th ESSLLI Student Session. Hamburg, Germany, August 7, 2008.
A basic introduction to using Subversion. Part of the LCC tutorial series. May 14, 2007.
Brief discussion of the AJAX web development technique. Given in Clippers, January 19, 2007.
Broad-based introduction to linguistics. Course description.
Undergraduate-level introduction to computational linguistics. Course description.