Projects in Computational Linguistics  



 

 

Projects in Computational Linguistics

 

Current projects

    Hybrid methods for acquisition and tuning of lexical information

    Broad coverage dictionaries and ontologies for natural language processing (NLP) are difficult and costly to create and maintain by hand. It is therefore desirable to learn them from distributional information, such as can be obtained from unlabeled or sparsely labeled text corpora. Many linguistic and psycholinguistic theories are distributional, but emphasize local neighborhood structure more than do previous NLP approaches. Successful visualization techniques such as keyword-in-context also rely on the preservation of neighborhood structure. A similar emphasis is present in emerging techniques for data reduction, such as LLE and min-cut algorithms, whose application to language data the project is investigating.

    While the immediate goal of the project is to gain a better understanding of lexical tuning and acquisition, the resulting dictionaries, ontologies and mapping techniques have the potential to help information professionals (such as librarians, translators, patent examiners and paralegal researchers) to navigate through corpora, to understand the significance of the data that they see, and to incorporate insights derived from the data into their working practice.

    We are integrating computational linguistics into the undergraduate curriculum of the Department of Linguistics, creating new courses designed primarily to appeal to students majoring in the humanities, and to offer such students fresh options in meeting the scientific, mathematical and quantitative components of the university's breadth requirement.

    • Funded by: National Science Foundation HLT program
    • Duration: Feb 2004-Jan 2009
    • People involved: Chris Brew , Kirk Baker, Jianguo Li and Anna Feldman (with considerable gratititude to Jiri Hana)
    • Publications:
      • Jiri Hana,Anna Feldman and Chris Brew. A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources. Empirical Methods in Natural Language Processing (EMNLP). July 2004. Barcelona. Spain.
      • Jiri Hana and Anna Feldman Portable Language Technology: Russian via Czech. Midwest Computational Linguistics Colloquium. June 2004. Bloomington. Indiana.
      • Jiri Hana, Anna Feldman and Chris Brew. Tagging Russian using Czech resources. Invited talk. Cognitive Science Colloquium. May 14, 2004.
      • Jianguo Li and Chris Brew: Automatic extraction of subcategorisation frames from spoken corpora, Verb Workshop 2005, Saarland, February 28 - March 1, 2005
      • Kirk Baker: Regular and irregular pseudoverb classification using XMOD, presented at MCWOP-10

 

Completed projects


 

Copyright © 2008 Department of Linguistics, The Ohio State University
Questions? see our Contacts page.
To report problems with this web site, contact webmaster@ling.ohio-state.edu
Global Hits: 15656738