
I am a Ph. D. student in linguistics. My general academic interests include computational, mathematical, and theoretical linguistics, focusing on the intersection of logic, language, and information. You can also find photos, my software engineering portfolio, and other stuff about me over at coffeeblack.
My theoretical work mostly focuses on developing Convergent Grammar (CVG), a logical framework for describing the phonology/syntax/semantics/pragmatics interface, with Carl Pollard and others. I also work with Michael White on OpenCCG (an open-source parser and realizer for CCG).
The name Pep stands for Pep is an Earley Parser and is an example of direct left recursion. Pep is an implementation of Earley's chart-parsing algorithm in Java. It includes a thin command-line interface, but is intended to be used as a library. Pep is free software, released under the GNU Lesser General Public License.
The tar bundle above contains Pep's binaries, full source code, generated documentation, and an Ant build file. It also includes several sample grammars for testing and automated JUnit tests.
Pep can parse strings specified by any CFG (including those that contain recursive rules). Version 0.4 is generalized to allow rules with right-hand sides that include a mix of terminals and nonterminals. Pep's charts use backpointers so that if a grammar allows ambiguity, Pep keeps track of all of the possible parses in a set of iterable parse trees.
Google's recently-released Web 1T 5-gram Corpus contains so much data that many machines with average amounts of memory are unable to even load it. Funnel is a tool for filtering enormous LMs down to a more manageable size based on user-defineable criteria, such as a limited vocabulary.
Custom filters can be specified by implementing a very simple interface with one method. Filters can also be chained in series, so the effects of one can be made to cascade to others. Funnel works with single-file count LMs as well as with the Google multiple-file format.
Invited talk with Carl Pollard. Presented at the CAuLD workshop on Logical Methods for Discourse, December 14, 2009.
My second qualifying paper. Presented in Clippers, November 6, 2009.
Presentation of my ESSLLI 2008 paper. 13th ESSLLI Student Session. Hamburg, Germany, August 7, 2008.
A basic introduction to using Subversion. Part of the LCC tutorial series. May 14, 2007.
Brief discussion of the AJAX web development technique. Given in Clippers, January 19, 2007.
Graduate foundational course on the mathematical tools used in linguistic theory.
Broad-based introduction to linguistics. Course description.
Undergraduate-level introduction to computational linguistics. Course description.