Linguistics 384: Language and Computers
Course goals: In the past decade, the widening use of computers has had
a profound influence on the way ordinary people communicate, search and store
information. For the overwhelming majority of people and situations, the
natural vehicle for such information is natural language. Text and to a lesser
extent speech are crucial encoding formats for the information revolution.
In this course, you will be given insight into the fundamentals of how
computers are used to represent, process and organize textual and spoken
information, as well as tips on how to effectively integrate this knowledge
into working practice. We will cover the theory and practice of human language
technology. Topics include text encoding, search technology, tools for writing
support, machine translation, dialog systems, computer aided language learning
and the social context of language technology.
GEC: The course satisfies the GEC category 2B (Mathematical and Logical
Analysis). It does so by using natural language systems to motivate students to
exercise and develop a range of basic skills in formal and computational
analysis. The course philosophy is to ground abstract concepts in real world
examples. We introduce strings, regular expressions, finite-state and
context-free grammars, as well as algorithms defined over these structures and
techniques for probing and evaluating systems that rely on these algorithms.
The course goes beyond merely subjective evaluation of systems, emphasizing analysis
and reasoning to draw and argue for valid conclusions about the design,
capabilities and behavior of natural language systems.
Instructor: María Aránzazu Martín-Lozano (Arantxa)
Course meets: Tuesdays 3:30-5:18pm in 209 Central Classroom and
Thursdays 3:30-5:18pm in 345 Central Classroom
Carmen: We'll be using the Carmen course management tool for
the course, which is accessible at http://carmen.osu.edu.
You'll use it to
Note that email from Carmen
is sent to the official email addresses (Name.Number@osu.edu) of the
students enrolled in the class and the instructor. You should read email sent
to your official osu account on a daily basis.
I will distribute slides in class for each unit. These will also be available
on the web after the class in which they are first distributed. These slides
are only a skeleton of the material covered; they cannot replace actually being
in class.
Course requirements: The basic requirement is regular attendance in
class and active participation. There will be roughly one online quiz
per topic, to ensure the material covered in class is mastered. And there will
be one homework (exercise sheet) per topic, which are intended to give
the opportunity to explore new aspects of the topics discussed in class. The midterm
will consist of the material covered in the first half of the class, and the final
will cover the contents covered in the second half of the class.
Grading: Grades will be based on participation in classroom discussion
and group work, quizzes, homeworks, a midterm exam, and the final examination,
using the following scheme:
|
Participation |
10% |
|
|
Quizzes |
20% |
|
|
Homeworks |
30% |
|
|
Midterm |
20% |
|
|
Final |
20% |
|
Note: I will not remind you when you have a quiz due. It is
your responsibility to keep up with the
syllabus/calendar of the course.
|
|
|
B+ |
87--89 |
C+ |
77--79 |
D+ |
67--69 |
E |
0--59 |
|
A |
93--100 |
B |
83--86 |
C |
73--76 |
D |
60--66 |
|
|
|
A- |
90--92 |
B- |
80--82 |
C- |
70--72 |
|
|
|
|
Make-up Policy: If you know you won't be able to make a deadline or
exam, please see me before you miss the deadline or exam. If you miss
the midterm or final, you will have to provide extensive written documentation
for your excuse.
As you generally will have a week to take them, there are no make-ups for the
quizzes.
Academic Misconduct: To state the obvious, academic dishonesty is not
allowed. Cheating on tests or on other assignments will be reported to the
University Committee on Academic Misconduct. The most common form of misconduct
is plagiarism. Remember that any time you use the ideas or the materials of
another person, you must acknowledge that you have done so in a citation. This
includes material that you have found on the Web or given to you by another
student by email, telephone, in person, etc.
These are some basic class
etiquette rules that I will expect you to follow:
Writing systems used for
language. Representing text on the computer. Digital representations of speech.
What facilities exist for
searching for language-based information? Different query languages and what
they allow you to do. Differences between specific and general queries. How to
evaluate the results of a search.
Techniques for classifying
documents. What language(s) are they written in? Are they junk mail? Are statistical
techniques better than rule-based ones, or not? When will the techniques fail?
What do so-called ``grammar
checkers'' and ``spelling correctors'' do? What do such programs base their
advice on? When does it make sense to use such tools and what kind of errors
are to be expected?
What do the free
internet-based translation services manage to do---and where do they fail? For
what purposes can automatic machine translation work reliably? What translation
support functions can a computer provide? A closer look at what makes machine
translation such a hard task. Is it the grammar, the meaning, the culture, all
three, or something else?
Eliza and its surprising
success in engaging people in conversation. When are dialog systems used, for
what purpose? A closer look at the components of a dialog system. Where is what
kind of knowledge needed to make it work?
What is involved in learning a
foreign language? What role in language learning can computers play: from
vocabulary training, via presentation of learning material, to providing
feedback on learner errors and progress.
How do we react to computers
that make use of language? What does it mean for the way we see ourselves? What
assumptions do we make about every user of language, be it a human or a
machine.
| Week | Month | Date | Day | Topic | Assignments (due at 3:30p.m.) |
| 1 | Sep | 21 | Th | Introduction | |
| 2 | 26 | T | 1. Text and speech
encoding (slides) (handout) |
||
| 28 | Th | ||||
| 3 | Oct | 3 | T | ||
| 5 | Th | 2. Searching
(slides) (handout) |
Quiz1, HW1 | ||
| 4 | 10 | T | |||
| 12 | Th | ||||
| 5 | 17 | T | 3.
Text Classification (Spam filtering) Handout |
Quiz2, HW2 | |
| 19 | Th | ||||
| 6 | 24 | T | 4.
Writer's aids (handout) |
||
| 26 | Th | Quiz 3 | |||
| 7 | 31 | T | |||
| Nov | 2 | Th | Midterm (review sheet) | ||
| 8 | 7 | T | |||
| 9 | Th | Quiz 4 | |||
| 9 | 14 | T | 5.
Machine Translation (handout) |
||
| 16 | Th | ||||
| 10 | 21 | T | Quiz5 , HW3 | ||
| 23 | Th | No class | |||
| 11 | 28 | T | |||
| 30 | Th | ||||
| 12 | Dec | 5 | T | Final (3:30-5:18, CC 345) (review sheet) |
Disclaimer: This syllabus is subject to change. All important
changes will be made in writing (email), with ample time for adjustment.
Students with
Disabilities: Students who need an
accommodation based on the impact of a disability should contact me as soon as
possible to discuss the course format, to anticipate needs, and to explore potential
accommodations. I rely on the Office of Disability Services for assistance in
verifying the need for accommodations and developing accommodation strategies.
Students who have not previously contacted the Office for Disability Services
are encouraged to do so (292-3307; http://www.ods.ohio-state.edu).