Fixing the sink
Me and V doing some plumbing.

Scott Martin

Department of Linguistics, Ohio State University

Email
scott ling ohio-state edu
My public key
Address
1712 Neil Avenue
Columbus, OH 43210 USA
Phone
+1 614-292-8878

I am a Ph.D. candidate in linguistics. My general academic interests include computational, formal, and mathematical linguistics, specifically, the syntax-semantics interface (dynamic semantics, paraphrase alignment) and morphosyntax (French clitics).

My formal work is mostly on developing a logical framework for describing the phonology/syntax/semantics/pragmatics interface, with Carl Pollard, Craige Roberts, and others. Lately, this work has focused on developing a natural language semantics that handles static, dynamic, and projective aspects of meaning.

I also work with Michael White on OpenCCG (an open-source parser and realizer for CCG). Our main goals with this work are to use grammar engineering to improve the automatic alignment of paraphrases and to improve MT evaluation by generating more high-quality reference sentences.

Here's my CV. Google Scholar has an author profile for me, and you can also find photos, my software engineering portfolio, and other stuff about me over at coffeeblack.

Publications & Talks

On Categorial Grammar and Dynamic Semantics

On Natural Language Generation, Grammar Engineering, and Machine Translation/Paraphrasing

Tutorials

Teaching

Graduate

680: Formal Foundations of Linguistic Theory
(Assistant to Carl Pollard.) Foundational course on the mathematical tools used in formal linguistics.
602.01: Syntax 1
(Assistant to Bob Levine.) Overview of syntactic theory and description based on HPSG.
  • Autumn 2011

Undergraduate

384: Language and Computers
Broad-based overview of topics in computational linguistics.
280: Language and Formal Reasoning
Truth-conditional meaning in natural language and its interaction with deductive reasoning.
201: Introduction to Language
Survey course in general linguistics.

Software

PEP

The name PEP stands for PEP is an Earley Parser and is an example of direct left recursion. PEP is an implementation of Earley's chart-parsing algorithm in Java. It includes a thin command-line interface, but is intended to be used as a library. PEP is free software released under the GNU Lesser General Public License.

PEP source and binaries
Version 0.4
Signature
generated using my public key
API Documentation
generated by JavaDoc

The tar bundle above contains PEP's binaries, full source code, generated documentation, and an Ant build file. It also includes several sample grammars for testing and automated JUnit tests.

PEP can parse strings licensed by any CFG (including those that contain recursive rules). PEP's charts use backpointers so that if a grammar allows ambiguity, PEP keeps track of all of the possible parses in a set of traversable parse trees. Version 0.4 is generalized to allow rules with right-hand sides that include a mix of terminals and nonterminals.

As an example, if the file duck.xml specifies the following CFG,

S → NP VP
VP → VT NP
VP → VS S
VS → saw
VT → saw
NP → Mary
NP → Det N
Det → her
NP → her
N → duck
VP → duck
then PEP can be invoked to parse the string Mary saw her duck as follows:
$ pep -g duck.xml -s S "Mary saw her duck"
ACCEPT: S -> [Mary, saw, her, duck] (2)
1. [S[NP[Mary]][VP[VT[saw]][NP[Det[her]][N[duck]]]]]
2. [S[NP[Mary]][VP[VS[saw]][S[NP[her]][VP[duck]]]]]
Here, the -s S argument tells PEP to parse for category S. The output says that the string is accepted, then gives the two parse trees licensed by the ambiguous grammar.

Funnel

Google's recently-released Web 1T 5-gram Corpus contains so much data that many machines with average amounts of memory are unable to even load it. Funnel is a free tool (released under the GPL) for filtering enormous LMs down to a more manageable size based on user-definable criteria, such as a limited vocabulary.

Funnel source and binaries
Version 0.1
Signature
generated using my public key

Custom filters can be specified by implementing a very simple interface with one method. Filters can also be chained in series, so the effects of one can be made to cascade to others. Funnel works with single-file count LMs as well as with the Google multiple-file format.