Languages for computational linguistics

Languages for doing computational work can be divided into two broad categories--declarative languages, and procedural languages. Most work in industry is done in Perl and C++; Java can be expected to have a growing role as time goes on.


Perl is popular for its phenomenal support for string-handling tasks of all kinds. The best books for learning Perl are:

Use them together-- Learning Perl will help you understand Perl, and Perl for Dummies will let you actually be effective in it. If you're an impoverished graduate student and can only afford one, buy Perl for Dummies.

There's also a new book by Michael Hammond, Programming for Linguists: Perl for Language Researchers. I'm not familiar with it, one way or the other.

Perl really is quite wonderful for linguists. For some examples of what you can do with it, see:


Java shows some promise for replacing C++ for building some linguistic applications. Although it doesn't quite match the ease of string-handling that Perl delivers, it is certainly easier to perform many string manipulations in Java than in C++. Also, a good Java regular expression engine is available, which is a big help in bringing Java closer to the level of Perl. Update: Java 1.4 includes a regular expression package. The best documentation of it is probably in David Flanagan's Java In A Nutshell, 4th ed. Also see the second edition of Jeffrey Friedl's Mastering Regular Expressions (be sure to get the second edition--Java isn't covered in the first), and Sun's pages here.

Java is very strong on handling non-ASCII character sets; if you need to work with Unicode to work with your language of interest, you should definitely check out Java.

Michael Hammond has a new book, Programming for Linguists: Java Technology for Language Researchers, that you should check out. The Java is somewhat out of date; see the book for why.

I provide some links here to various documents that address linguistic functionality in Java. I also have some squibs here that I've written that serve as tutorials on the sorts of things that a linguist would want to do with Java. Note that these tutorials assume that you have some basic familiarity with Java already. If that's not you, see Java in a Nutshell, by Daniel Flanagan. If the Flanagan book is more than you can handle at this point, try Java2 Fast and Easy Web Development or Elizabeth Castro's book (sorry no URL).

Declarative languages

Your basic choices here are LISP or Prolog. A good beginning LISP book: The Little LISPer. Look for it before you actually need it, because most places will have to special-order it for you. To really learn Lisp, get Paul Graham's ANSI Common Lisp. When you're ready for linguistic programming, check out Gazdar et al.'s Natural language processing in Lisp: an introduction to computational linguistics and Peter Norvig's Paradigms of artificial intelligence programming: case studies in Common Lisp.

Sundry LISP sites:

Some Prolog sites:

K. Bretonnel Cohen's home page