Proceedings of EACL'03.
We propose a new method for detecting errors in ``gold-standard'' part-of-speech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finite-state tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Treebank.
Electronically available file formats:
Bibtex entry:
@InProceedings{dickinson:meurers:03,
author = {Markus Dickinson and W. Detmar Meurers},
title = {Detecting Errors in Part-of-Speech Annotation},
booktitle = {Proceedings of the 10th Conference of the European
Chapter of the Association for Computational Linguistics
(EACL-03)},
pages = {107-114},
address = {Budapest, Hungary},
year = {2003},
url = {http://ling.osu.edu/~dickinso/papers/dickinson-meurers-03.html}
}
The variation n-gram code used in the paper is freely available. Just send me an e-mail at the address below.