In this course, students will learn the basics of probabilistic modeling and machine learning for natural language processing and computational linguistics. Along the way, students will gain experience with using the Python programming language to analyze corpus data.
The course will be based primarily on the second edition of Jurafsky and Martin's textbook, Speech and Language Processing. We will also work hands-on with Bird, Klein and Loper's book, Natural Language Processing with Python, based on the Natural Language Toolkit (NLTK).
Student in the course will have the opportunity to:
Topics will include:
Time permitting, we may also look at semantic role labeling, machine translation, or statistical parsing with Combinatory Categorial Grammars (CCGs).
Ling 684.01 or equivalent. The course is open to advanced undergraduate and graduate students.
Letter grades will be assigned using the standard OSU scale based on class participation and homework assignments.
You will be expected to keep up with the readings and actively participate in class discussions and activities.
There will be four regular homework assignments, with the lowest score dropped in calculating the grade. Homework assignments are generally due by the beginning of class, in the Carmen dropbox. No late homeworks will be accepted without prior notice of a justifiable delay.
I encourage group work on the homework assignments, but each of you should write out your own answers. Note that group work means that everyone in the group contributes and fully understands what you turn in.
Each student will design their own final project. A brief description of the project must be submitted for approval in Week 6. Projects are required to include a quantitative evaluation. During the last week of classes, students will present lessons learned from the project to the class. The project write-up will be due on the date the final exam would normally be held.
The final project is not expected to be novel research. Instead, it is expected to require the same level of effort as one of the homework assignments, with the level of difficulty on a par with the advanced exercises in the Jurafsky and Martin textbook. As the final project requires both design and presentation activities, it carries double the weight of one of the homework assignments. A typical project might involve attempting to replicate the results reported in the textbook or the research literature on an NLP task using your own implementation.
We'll be using the Carmen system for the schedule and for homework and reading assignments. There will also be discussion forums for posting questions and providing feedback (comments, complaints or ideas) during the course, anonymously if desired.
As with any class at this university, students are required to follow the Ohio State Code of Student Conduct. In particular, note that students are not allowed to, among other things, submit plagiarized (copied but unacknowledged) work for credit. If any violation occurs, I am required to report the violation to the Council on Academic Misconduct.
Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations. I rely on the Office of Disability Services for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted the Office for Disability Services are encouraged to do so (292-3307; http://www.ods.ohio-state.edu).
This syllabus is subject to change. All important changes will be made in writing (email), with ample time for adjustment.