Mike's pic    

Michael White

Asst. Professor
Dept. of Linguistics
The Ohio State University
219 Oxley Hall
1712 Neil Ave.
Columbus, OH 43210
tel: (614) 247-4732, fax: (614) 292-8833
office hrs: Wed. 2:30-3:30 or by appt.
email: please use the address consisting of my last name followed by dot (no space) then the number one thousand two hundred and forty then the at symbol then osu dot edu (and nothing more) for all official OSU business, especially any email possibly containing FERPA-protected records

About | Teaching | Calendar | Activities | Projects | Papers

About Mike

I'm now one of 2.5 faculty members in the Linguistics department specializing in computational linguistics. I am also affiliated with the speech and language technologies lab in computer science & engineering. My research interests are in natural language generation, spoken language and multimodal dialogue systems, and the connection between natural language generation and speech synthesis.

Prior to joining the faculty here, I was a Research Fellow in the School of Informatics at the University of Edinburgh. Before crossing the pond to Scotland, I worked for many years at CoGenTex, Inc., a small company dedicated to developing commercial natural language generation software, as well as advancing research in NLG. Before joining the CGT crew, I obtained a Ph.D. in computer science from the University of Pennsylvania.

(Full CV)

Teaching

Calendar

Activities

Some recent activities:

Projects

Building Expressive Synthetic Voices for Conversational Systems

OSU Arts & Humanities Innovation Grant, 2007–2009

Collaborators: Chris Brew, Dominic Espinosa, Eric Fosler-Lussier, Kiwako Ito, Rajakrishnan Rajkumar, Shari Speer

Abstract. The focus of the project is investigating methods of building synthetic voices for conversational systems that are capable of expressing natural and contextually appropriate intonation. While data-driven techniques for producing synthetic speech have made great strides in the past ten years, at present general purpose synthetic voices are only good at synthesizing declarative sentences with neutral intonation. Neutral intonation does not suffice, however, in conversational systems: instead it sounds disengaged or "dead", and is often misleading as to the intended meaning. To overcome this impasse, we will pursue recently developed techniques for custom building expressive synthetic voices that target the capabilities of particular conversational systems.

The specific objectives of the project are twofold. Firstly, we will investigate the extent to which custom synthetic voices can produce natural sounding intonation via a psycholinguistic experiment. To do so, we will use an expressive synthetic voice, rather than recorded human speech, to replicate recent eye-tracking experiments which investigated the role of pitch accents during online discourse comprehension. These experiments demonstrated a processing advantage for contextually appropriate as compared to inappropriate uses of pitch accents in instructions. Eye movement monitoring is an ideal method of evaluating speech synthesis, since it provides an objective, non-intrusive, implicit measure of processing difficulty; how people process synthetic speech is also an interesting question in its own right. Secondly, we will devise a new, utility-based algorithm for optimizing the selection of intonationally varied sentences to record when building a custom synthetic voice, and evaluate its effectiveness in a perception experiment. This algorithm will fill in a crucial missing piece of the expressive synthesis puzzle, as most existing text selection algorithms do not take prosody into account.

Learning to Generate High Quality Paraphrases with a Broad Coverage Lexicalized Grammar

NSF IIS - Robust Intelligence Grant, 2008–2011

Collaborators: Steve Boxwell, Dominic Espinosa, Scott Martin, Dennis Mehay, Crystal Nakatsu, Rajakrishnan Rajkumar

Abstract. Research on automatic paraphrase generation has been gaining steam in recent years. Or in other words, research on generating paraphrases automatically has seen increasing progress lately. Automatic paraphrasing is considered vital to applications as diverse as machine translation (MT), question answering, summarization, and dialogue systems. Paraphrasing has also been shown recently to hold promise for automatic methods of evaluating MT, when the paraphrases are of sufficiently high quality.

This project investigates novel methods for acquiring and generating such high quality paraphrases in order to automatically approximate the human translation error rate (HTER) metric for MT evaluation, where human annotators post-edit MT outputs into acceptable paraphrases of the reference translations. The project emphasizes the use of a linguistically informed, grammar-based parser and realizer for acquiring and generating paraphrases using disjunctive logical forms (DLFs), in sharp contrast to most recent work that relies entirely on shallow methods. Specifically, the project investigates methods of (1) engineering a broad coverage English grammar from the CCGbank, with semantic roles integrated from Propbank; (2) scaling up OpenCCG for efficient parsing and realization with this grammar, adapting supertagging and parse ranking methods for generation; (3) adapting and extending previous methods of acquiring paraphrases to work on DLFs; (4) generating high quality n-best paraphrases of one or more reference sentences; and (5) experimentally evaluating whether the automatically generated paraphrases can be used with current MT metrics to yield improved correlations with human judgments of translation quality.

By providing a way to automatically approximate the HTER metric, the project will help drive future MT research. Additionally, by dramatically extending the realization capacity of OpenCCG, the project promises to benefit a wide range of NLP tasks where the breadth of target texts is of crucial importance.

Recent Papers

Michael White, Robert A. J. Clark and Johanna D. Moore. 2009. Generating tailored, comparative descriptions with contextually appropriate intonation. To appear in Computational Linguistics.

Michael White, Rajakrishnan Rajkumar, Kiwako Ito and Shari Speer. 2009. Eye Tracking for the Online Evaluation of Prosody in Speech Synthesis: Not So Fast! In Proc. of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH-09).

Michael White and Rajakrishnan Rajkumar. 2009. Perceptron Reranking for CCG Realization. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2009). (bib)

Rajakrishnan Rajkumar, Michael White and Dominic Espinosa. 2009. Exploiting Named Entity Classes in CCG Surface Realization. In Proc. of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2009). (bib) (poster)

Scott Martin, Rajakrishnan Rajkumar and Michael White. 2009. Grammar Engineering for CCG using Ant and XSLT. In Proc. of the NAACL HLT 2009 Workshop on Software Engineering, Testing and Quality Assurance for Natural Language Processing (SETQA-NLP 2009). (bib) (poster)

Michael White and Rajakrishnan Rajkumar. 2008. A More Precise Analysis of Punctuation for Broad-Coverage Surface Realization with CCG. In Proc. of the Workshop on Grammar Engineering Across Frameworks (GEAF08).

Dominic Espinosa, Michael White and Dennis Mehay. 2008. Hypertagging: Supertagging for Surface Realization with CCG. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT). (bib)

Stephen A. Boxwell and Michael White. 2008. Projecting Propbank Roles onto the CCGbank. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC-08).

Robert Dale and Michael White, editors. 2007. Report from the Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation.

Vasile Rus, Arthur C. Graesser, Amanda Stent, Marilyn Walker and Michael White. 2007. Text-to-Text Generation. In Report from the Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation.

Michael White, Rajakrishnan Rajkumar and Scott Martin. 2007. Towards Broad Coverage Surface Realization with CCG. In Proc. of the 2007 Workshop on Using Corpora for NLG: Language Generation and Machine Translation (UCNLG+MT).

Mary Ellen Foster and Michael White. 2007. Avoiding Repetition in Generated Text. In Proc. of the 11th European Workshop on Natural Language Generation. (bib)

Robert Dale and Michael White, editors. 2007. Position Papers of the Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation.

Michael White. 2006. CCG Chart Realization from Disjunctive Inputs. In Proc. of the 4th International Conference on Natural Language Generation (INLG-06). (bib)

Crystal Nakatsu and Michael White. 2006. Learning to Say It Well: Reranking Realizations by Predicted Synthesis Quality. In Proc. COLING-ACL-06. (bib)

Michael White. 2006. Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar. Research on Language and Computation, 4(1):39–75. (prefinal version)

Mary Ellen Foster and Michael White. 2005. Assessing the Impact of Adaptive Generation in the COMIC Multimodal Dialogue System. In Proc. of the IJCAI-05 Workshop on Knowledge and Reasoning in Practical Dialogue Systems.

Carsten Brockmann, Amy Isard, Jon Oberlander, and Michael White. 2005. Modelling alignment for affective dialogue. In Proc. of the UM-05 Workshop on Adapting the Interaction Style to Affective Factors.

Michael White, Mary Ellen Foster, Jon Oberlander, and Ash Brown. 2005. Using Facial Feedback to Enhance Turn-Taking in a Multimodal Dialogue System. In Proc. of the HCI International 2005 Thematic Session on Universal Access in Human-Computer Interaction.

Michael White. 2005. Designing an Extensible API for Integrating Language Modeling and Realization. In Proc. ACL-05 Workshop on Software.

Mary Ellen Foster, Michael White, Andrea Setzer, and Roberta Catizone. 2005. Generating Multimodal Output in the COMIC Dialogue System. ACL 2005 Demo Session. (Poster [A0 PDF])

Mary Ellen Foster and Michael White. 2004. Techniques for Text Planning with XSLT. In Proc. of the 4th NLPXML Workshop.

Michael White. 2004. Reining in CCG Chart Realization. In Proc. of the 3rd International Conference on Natural Language Generation (INLG-04).

Rachel Baker, Robert A. J. Clark, and Michael White. 2004. Synthesising Contextually Appropriate Intonation in Limited Domains. In Proc. of the 5th ISCA Speech Synthesis Workshop.

Johanna Moore, Mary Ellen Foster, Oliver Lemon, and Michael White. 2004. Generating Tailored, Comparative Descriptions in Spoken Dialogue. In Proc. of the 17th International FLAIRS Conference.

Michael White and Jason Baldridge. 2003. Adapting Chart Realization to CCG. In Proc. of the 9th European Workshop on Natural Language Generation.