Linguistics 384: Language and Computers

(Winter 2007)

Course goals: In the past decade, the widening use of computers has had a profound influence on the way ordinary people communicate, search and store information. For the overwhelming majority of people and situations, the natural vehicle for such information is natural language. Text and to a lesser extent speech are crucial encoding formats for the information revolution.

In this course, you will be given insight into the fundamentals of how computers are used to represent, process and organize textual and spoken information, as well as tips on how to effectively integrate this knowledge into working practice. We will cover the theory and practice of human language technology. Topics include text encoding, speech processing, search technology, document classification, tools for writing support, machine translation and computer aided language learning.

Gec: The course satisfies the GEC category 2B (Mathematical and Logical Analysis). It does so by using natural language systems to motivate students to exercise and develop a range of basic skills in formal and computational analysis. The course philosophy is to ground abstract concepts in real world examples. We introduce strings, regular expressions, finite-state and context-free grammars, as well as algorithms defined over these structures and techniques for probing and evaluating systems that rely on these algorithms. The course goes beyond merely subjective evaluation of systems, emphasizing analysis and reasoning to draw and argue for valid conclusions about the design, capabilities and behavior of natural language systems.

Instructor: Arantxa Martín-Lozano Course Coordinator: If you have any questions about this course you can contact Dr. Hope Dawson, TA coordinator of the Linguistics Department. E-mail: hdawson@ling.ohio-state.edu. Office: Oxley Hall 222A.

Course meets: Carmen: We'll be using the Carmen course management tool for the course, which is accessible at http://carmen.osu.edu. You'll use it to Note that email from Carmen is sent to the official email addresses (Name.Number@osu.edu) of the students enrolled in the class and the instructor. You should read email sent to your official osu account on a daily basis.

Carmen and privacy: Be aware that the Carmen system as it is set up at OSU keeps detailed logs of your interaction with the system, e.g., when you log in, how long you take to complete which question of the quiz, etc.

Readings: There is no textbook for this course (the topic is quite new, at OSU and elsewhere). There will be some readings assigned periodically throughout the course.

I will distribute slides in class for each unit. These will also be available on the web after the class in which they are first distributed. These slides are only a skeleton of the material covered; they cannot replace actually being in class. In my experience, students who actively participate in class enjoy the course more and get much better grades than those who don't.

Course requirements: The basic requirement is regular attendance in class and active participation. There will be eight online quizzes, to ensure the material covered in class is mastered. The midterm will consist of the material covered in the first half of the class, and the final will cover the content of the second half of the class.

Grading: Grades will be based on attendance, participation in classroom discussion and group work, quizzes, a midterm exam, and the final examination, using the following scheme:
Attendance 10%  
Participation 5%  
Quizzes 45%  
Midterm 20%  
Final 20%  
Make-up Policy: Since you generally will have a week to take them, there will be no make-ups for quizzes. If you miss the midterm or final, you will have to provide extensive written documentation for your excuse.

Academic Misconduct: To state the obvious, academic dishonesty is not allowed. Cheating on tests or on other assignments will be reported to the University Committee on Academic Misconduct. The most common form of misconduct is plagiarism. Remember that any time you use the ideas or the materials of another person, you must acknowledge that you have done so in a citation. This includes material that you have found on the Web or given to you by another student by email, telephone, in person, etc. The University provides guidelines for research on the Web at http://gateway.lib.ohio-state.edu/tutor/ and you can find the Student Code of Conduct at http://studentaffairs.osu.edu/resource_csc.asp

Non-academic Misconduct: Everyone in the class will be expected to be respectful of others. Behaviours contributing to the creation of a hostile working atmosphere will be reported to student judicial affairs.

Class etiquette: For the best interest of everyone in the class, please follow the following principles of class etiquette. Topics:
  1. Storing language on the computer: Text and speech encoding.
    Writing systems used for language. Representing text on the computer. Digital representations of speech.
  2. Searching: web, library catalogs, and other language-based databases
    What facilities exist for searching for language-based information? Different query languages and what they allow you to do. Differences between specific and general queries. How to evaluate the results of a search.
  3. Classifying documents: Language identification and spam filtering
    Techniques for classifying documents. What language(s) are they written in? Are they junk mail? Are statistical techniques better than rule-based ones, or not? When will the techniques fail?
  4. Writer's aids: Spelling and grammar correction
    What do so-called ``grammar checkers'' and ``spelling correctors'' do? What do such programs base their advice on? When does it make sense to use such tools and what kind of errors are to be expected?
  5. Machine translation
    What do the free internet-based translation services manage to do---and where do they fail? For what purposes can automatic machine translation work reliably? What translation support functions can a computer provide? A closer look at what makes machine translation such a hard task. Is it the grammar, the meaning, the culture, all three, or something else?
  6. Computer-Aided Language Learning
    What is involved in learning a foreign language? What role in language learning can computers play: from vocabulary training, via presentation of learning material, to providing feedback on learner errors and progress.
Schedule: The latest version of the schedule is always available from our web page. After the lectures, the titles in the schedule below are linked to the handouts we used (in pdf format).

Week Month Date Day Topic Assignments (due before class)
1 Jan 4 Th Introduction  
2 Jan 9 Tu 1. Text and speech encoding (work from screen)
(To print: 9up)
 
2 Jan 11 Th 1. Text and speech encoding  
3 Jan 16 Tu 1. Text and speech encoding Quiz1
3 Jan 18 Th 2. Searching (work from screen)
(To print: 9up)
Quiz2
4 Jan 23 Tu 2. Searching  
4 Jan 25 Th 2. Searching  
5 Jan 30 Tu 3. Document Classification (Spam Detection) (work from screen)
(To print: 9up)
Quiz3
5 Feb 1 Th 3. Text classification  
6 Feb 6 Tu 3. Text classification  
6 Feb 8 Th Midterm (review sheet) Quiz4
7 Feb 13 Tu 4. Writer's Aids (work from screen)
(To print: 9up)
 
7 Feb 15 Th 4. Writer's aids  
8 Feb 20 Tu 4. Writer's aids  
8 Feb 22 Th 5. Machine Translation (work from screen)
(To print: 9up)
 
9 Feb 27 Tu 5. Machine Translation Quiz 5
9 March 1 Th 5. Machine Translation  
10 March 6 Tu 6. Computer Aided Language Learning (work from screen)
(To print: 9up)
Quiz6
10 March 8 Th 6. Computer Aided Language Learning  
11 March 12-15 M-Th Finals Week  
        Final Review Sheet Section 2 (call #122844): Wednesday 14 March, 11:30-1:18 CC 345 Quiz 7 due
        Final Review Sheet Section 1 (call #122865): Thursday 15th of March, 1:30-3:18, CC 345 Quiz 7 due

Disclaimer: This syllabus is subject to change.



Students with Disabilities: Students who need an accommodation based on the impact of a disability should contact me as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations. I rely on the Office of Disability Services for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted the Office for Disability Services are encouraged to do so (292-3307; http://www.ods.ohio-state.edu).


This document was translated from LATEX by HEVEA.