Empirical Methods in Natural Language Generation

Ling 884 - Seminar in Computational Linguistics
Winter '06, MW 9:30-11:18, 048 Derby Hall
Instructor: Michael White
http://www.ling.ohio-state.edu/~mwhite/

Description

Natural Language Generation (NLG) is the sub-field of Natural Language Processing (NLP) that is concerned with enabling computers to convey information to people through both spoken and written language. NLG components play important roles in dialogue systems, report generators, machine translation systems, summarizers, and question answering systems. Traditionally, symbolic techniques have predominated in NLG, though recent years have seen increasing use of corpus-based and machine learning methods. As in other areas of NLP, these empirical methods hold out the promise of more robust and flexible systems that require less knowledge engineering effort to build.

In this seminar, we will critically examine recent work in this area, with the goal of better understanding its success to-date and its true potential. Topics to be covered include the use of empirical methods in all areas of NLG: content selection and ordering, along with the related topic of dialogue strategy selection; sentence planning, including aggregation, referring expression generation and lexical choice; and surface realization, including modifier ordering and other forms of syntactic choice.

Prerequisites

The comp ling intro courses (684.1 and 2) or permission of the instructor. Note that much of the material should still be accessible to those that have yet to take the intro courses, so please check with the instructor if interested.

Requirements

Class discussion (15%): To facilitate discussion, each student should come to class with at least one good question in mind per assigned reading.
Presentation (35%): Much of the class time will be dedicated to student presentations of a related set of papers from the reading list. To help develop presentation skills, the student's presentation materials (slides, handouts) will be reviewed in advance with the instructor. Seminar participants will also provide constructive feedback on each presentation.
Term project (50%): The goal of the term project will be to examine an NLG subtask in depth. Projects may be conducted individually or in groups. The instructor will propose several projects involving the OpenCCG surface realizer in the COMIC dialogue system. Students may choose to work on one of these projects or propose their own.

Carmen

We're going to try using the Carmen system to schedule presentations, post advance questions on the readings, provide feedback to presenters, and discuss class projects. Carmen will also be used to provide local access to PDFs that are not readily available.

Reading List

Note that the reading list represents a starting point for the papers we will read during the quarter, with the exact set of papers to be covered depending on student interest.

NLG Intro

Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Journal of Natural Language Engineering, 3, 57-87.

Robert Dale and Ehud Reiter. 1999. EACL-99 Tutorial on Building Natural Language Generation Systems.

Dialogue Strategy Selection

Satinder Singh, Diane Litman, Michael Kearns and Marilyn Walker. 2002. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System. In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133.

James Henderson, Oliver Lemon and Kallirroi Georgila. 2005. Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data. In Proc. IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems, Edinburgh.

Kallirroi Georgila, James Henderson and Oliver Lemon. 2005. Learning User Simulations for Information State Update Dialogue Systems. In Proc. 9th European Conf. on Speech Communication and Technology (INTERSPEECH - EUROSPEECH 2005), Lisbon.

Content Selection and Ordering

Pablo A. Duboue and Kathleen R. McKeown. 2003. Statistical Acquisition of Content Selection Rules for Natural Language Generation. In Proc. of the 2003 Conference on Empirical Methods for Natural Language Processing (EMNLP-03), Sapporo.

Regina Barzilay and Mirella Lapata. 2005. Collective Content Selection for Concept-To-Text Generation. In Proc. of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP).

Regina Barzilay and Lillian Lee. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In Proc. of NAACL-HLT. [local access PDF]

Nikiforos Karamanis, Chris Mellish, Jon Oberlander, and Massimo Poesio. 2004. Evaluating Centering-based metrics of coherence for text structuring using a reliably annotated corpus. In Proc. of the 42nd Annual Meeting of the Association of Computational Linguistics.

Mirella Lapata and Regina Barzilay. 2005. Automatic Evaluation of Text Coherence: Models and Representations. In Proc. of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05).

Regina Barzilay and Mirella Lapata. 2005. Modeling Local Coherence: An Entity-Based Approach. In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05).

Sentence Planning

Marilyn Walker, Owen Rambow and Monica Rogati. 2002. Training a Sentence Planner for Spoken Dialogue Using Boosting. Computer Speech and Language 16(3):409-433. [Final version available through OSCAR.] [local access PDF]

Daniel Hardt and Owen Rambow. 2001. Generating VP Ellipsis. In Proc. of the 39th Meeting of the Association for Computational Linguistics (ACL'01).

Shimei Pan and James Shaw. 2005. Instance-based Sentence Boundary Determination by Optimization for Natural Language Generation. In Pro. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05).

Surface Realization

Jon Oberlander and Chris Brew. 2000. Stochastic text generation, Philosophical Transactions of the Royal Society of London, Series A, 358: 1373--1385. [Final version available through OSCAR.] [local access PDF]

Irene Langkilde and Kevin Knight. 1998. Generation that Exploits Corpus-based Statistical Knowledge, Proc. of COLING-ACL ('98). [local access PDF]

Irene Langkilde. 2000. Forest-Based Statistical Sentence Generation, Proc. of the Association for Computational Linguistics Conference, North American chapter (NAACL-2000). [local access PDF]

Srinivas Bangalore and Owen Rambow. 2000. Exploiting a Probabilistic Hierarchical Model for Generation, Proc. of the International Conference on Computational Linguistics (COLING-2000). [local access PDF]

Simon Corston-Oliver, Michael Gamon, Eric Ringger, Robert Moore. 2002. An overview of Amalgam: A machine-learned generation module. In Proc. of the Second International Conference on Natural Language Generation (INLG-02).

Michael White. 2005. Designing an Extensible API for Integrating Language Modeling and Realization. In Proc. ACL-05 Workshop on Software.

Michael White. 2004. Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar. To appear in Research on Language and Computation.

Anja Belz. 2005. Corpus-driven generation of weather forecasts. In Proc. of the 3rd Corpus Linguistics Conference (CL-05).

John Carroll and Stephan Oepen. 2005. High efficiency realization for a wide-coverage unification grammar. In Proc. of the Second International Joint Conference on Natural Language Processing (IJCNLP-05).

Carsten Brockmann, Amy Isard, Jon Oberlander, and Michael White. 2005. Modelling alignment for affective dialogue. In Proc. of the UM-05 Workshop on Adapting the Interaction Style to Affective Factors.

Daniel S. Paiva and Roger Evans. 2005. Empirically-based Control of Natural Language Generation. In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05).

Coupling NLG and Speech Synthesis

Ivan Bulyko and Mari Ostendorf. 2002. Efficient integrated response generation from multiple targets using weighted finite state transducers. Computer Speech and Language 16(3):533-550. [Available through OSCAR.] [local access PDF]

Alice Oh and Alex Rudnicky. 2002. Stochastic natural language generation for spoken dialog systems. Computer Speech and Language 16(3):387-407. [Available through OSCAR.] [local access PDF]

Shimei Pan, Kathleen McKeown and Julia Hirschberg. 2002. Exploring features from natural language generation for prosody modeling. Computer Speech and Language 16(3):457-490. [Available through OSCAR.] [local access PDF]

Matthew Stone, Doug DeCarlo, Insuk Oh, Christian Rodriguez, Adrian Stere, Alyssa Lees and Chris Bregler. 2004. Speaking with hands: Creating Animated Conversational Characters from Recordings of Human Performance. ACM Transactions on Graphics 23(3) (SIGGRAPH).

Shimei Pan and Wubin Weng. 2002. Designing a Speech Corpus for Instance-based Spoken Language Generation. In Proc. of the International Natural Language Generation Conference (INLG-02).

Mary Ellen Foster and Jon Oberlander. 2006. Data-driven Generation of Emphatic Facial Displays. To appear in Proc. EACL-06.

(Lexico-)Grammar and Paraphrase Acquisition

Huayan Zhong and Amanda Stent. 2005. Building Surface Realizers Automatically From Corpora Using General-Purpose Tools. In Proc. Corpus Linguistics '05 Workshop on Using Corpora for Natural Language Generation.

Tomasz Marciniak and Michael Strube. 2005. Using an Annotated Corpus as a Knowledge Source for Language Generation. In Proc. Corpus Linguistics '05 Workshop on Using Corpora for Natural Language Generation.

Rashmi Prasad, Aravind Joshi, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki and Bonnie Webber. 2005. The Penn Discourse TreeBank as a Resource for Natural Language Generation. In Proc. Corpus Linguistics '05 Workshop on Using Corpora for Natural Language Generation.

Regina Barzilay and Lillian Lee. 2003. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment. In Proc. of NAACL-HLT. [local access PDF]

Regina Barzilay and Lillian Lee. 2002. Bootstrapping Lexical Choice via Multiple-Sequence Alignment. In Proc. of EMNLP. [local access PDF]

Regina Barzilay and Kathleen McKeown. 2001. Extracting Paraphrases from a Parallel Corpus. In Proc. of ACL/EACL. [local access PDF]

Bo Pang, Kevin Knight and Daniel Marcu. 2003. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In Proc. of HLT/NAACL.

Evaluation

Srinivas Bangalore, Owen Rambow, Steven Whittaker. 2000. Evaluation Metrics for Generation. In Proc. of the International Conference on Natural Language Generation (INLG-2000). [local access PDF]

Irene Langkilde-Geary. 2002. An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator. In Proc. of the International Natural Language Generation Conference (INLG-02).

Charles Callaway. 2003. Evaluating Coverage for Large Symbolic NLG Grammars. In Proc. of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03).

Ehud Reiter and Somayajulu Sripada. 2002. Should Corpora Texts Be Gold Standards for NLG? In Proc. of the International Natural Language Generation Conference (INLG-02).

Somayajulu Sripada, Ehud Reiter and Lezan Hawizy. 2005. Evaluating an NLG System using Post-Edit Data: Lessons Learned. In Proc. of ENLG-05.

Policy on Academic Misconduct

As with any class at this university, students are required to follow the Ohio State Code of Student Conduct. In particular, note that students are not allowed to, among other things, submit plagiarized (copied but unacknowledged) work for credit. If any violation occurs, the instructor is required to report the violation to the Council on Academic Misconduct.