My OSU cycling club uniform

Scott Martin

scott ling osu edu
My GPG public key
mailing address
204 Oxley Hall
1712 Neil Avenue
Columbus, OH 43210
phone
614-688-3108
fax
614-292-8833

About Me

I am a Ph. D. student in linguistics. My academic interests include computational and theoretical linguistics, with particular focus on the intersection of logic, language, and information. I am listed on the OSU Linguistics web site. You can also find photos, my software engineering portfolio, and other stuff about me over at coffeeblack.

My theoretical work mostly focuses on developing Convergent Grammar (CVG), a proof-theoretic formalism, with Carl Pollard and others. I also work with Michael White on OpenCCG (an open-source parser and realizer for CCG). My work on OpenCCG includes building grammardoc, an application that generates HTML documentation from OpenCCG’s grammar specifications, integrating the SRI Language Modeling Toolkit for ngram scoring, and developing ccgbankextract, a tool for corpus-based grammar extraction.

I spend my copious free time riding with the OSU Cycling Club and serving as treasurer for the Student Linguistics Association.

Projects

Pep

The name Pep stands for Pep is an Earley Parser and is an example of direct left recursion. An implementation of Earley's chart-parsing algorithm in Java, Pep includes a thin command-line interface, but is intended to be used as a library. Pep is free software, released under the GNU Lesser General Public License.

Pep source and binaries
Pep version 0.3
GPG signature
My public key

The tar bundle above contains Pep's binaries, full source code, generated documentation, and an Ant build file. It also includes several sample grammars for testing and automated JUnit tests.

Pep can parse strings specified by any CFG (including those that contain recursive rules and ε-productions) provided that terminals only occur alone on the right side. Pep's charts use backpointers so that if a grammar allows ambiguity, Pep keeps track of all of the possible parses in a set of iterable parse trees.

Funnel

Google's recently-released Web 1T 5-gram Corpus contains so much data that many machines with average amounts of memory are unable to even load it. Funnel is a tool for filtering enormous LMs down to a more manageable size based on user-defineable criteria, such as a limited vocabulary.

Funnel source and binaries
Funnel version 0.1
GPG signature
My public key

Custom filters can be specified by implementing a very simple interface with one method. Filters can also be chained in series, so the effects of one can be made to cascade to others. Funnel works with single-file count LMs as well as with the Google multiple-file format.

Publications

In Proceedings

Talks

A Proof-theoretic Approach to FPCs

Presentation of my ESSLLI 2008 paper. 13th ESSLLI Student Session. Hamburg, Germany, August 7, 2008.

PDF Slides

Subversion Tutorial

A basic introduction to using Subversion. Part of the LCC tutorial series. May 14, 2007.

PDF Slides

AJAX Overview

Brief discussion of the AJAX web development technique. Given in Clippers, January 19, 2007.

PDF Slides
Source code
A simple example implementation.

Teaching

Linguistics 201: Introduction to Language in the Humanities

Broad-based introduction to linguistics. Course description.

Linguistics 384: Language and Computers

Undergraduate-level introduction to computational linguistics. Course description.