|
  | Projects in Computational Linguistics |   |
|
  |
Projects in Computational Linguistics
Current projects
Hybrid methods for acquisition and tuning of lexical informationBroad coverage dictionaries and ontologies for natural language processing (NLP) are difficult and costly to create and maintain by hand. It is therefore desirable to learn them from distributional information, such as can be obtained from unlabeled or sparsely labeled text corpora. Many linguistic and psycholinguistic theories are distributional, but emphasize local neighborhood structure more than do previous NLP approaches. Successful visualization techniques such as keyword-in-context also rely on the preservation of neighborhood structure. A similar emphasis is present in emerging techniques for data reduction, such as LLE and min-cut algorithms, whose application to language data the project is investigating. While the immediate goal of the project is to gain a better understanding of lexical tuning and acquisition, the resulting dictionaries, ontologies and mapping techniques have the potential to help information professionals (such as librarians, translators, patent examiners and paralegal researchers) to navigate through corpora, to understand the significance of the data that they see, and to incorporate insights derived from the data into their working practice. We are integrating computational linguistics into the undergraduate curriculum of the Department of Linguistics, creating new courses designed primarily to appeal to students majoring in the humanities, and to offer such students fresh options in meeting the scientific, mathematical and quantitative components of the university's breadth requirement.
Completed projects
DECCA: Detection of Errors and Correct in Corpus AnnotationThe success of data-driven approaches and stochastic modeling in computational linguistic research and applications is rooted in the availability of electronic natural language corpora. Despite the central role that annotated corpora play for computational linguistic research and applications, the question of how errors in the annotation of corpora can be detected and corrected has received only little attention. The project is designed to address this important gap by exploring an error detection and correction method that is applicable to a wide range of corpus annotations. TAGARELA: Bridging the Gap between Research in Natural Language Processing and Individualized Language Instruction
CoGETI: Constraint-based Grammar: Data, Theory, and ImplementationMiLCA: Media-intensive teaching modules in the computational linguistics curriculumFrom Corpus Resources to Linguistic Phenomena:
|
|
Copyright © 2008 Department of Linguistics,
The Ohio State University |