CLApr 27, 2023

Cross-Domain Evaluation of POS Taggers: From Wall Street Journal to Fandom Wiki

arXiv:2304.13989v13 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This work highlights the limitations of POS taggers in out-of-domain settings, particularly for NLP researchers and practitioners dealing with specialized or informal text data, but it is incremental as it applies existing methods to new data without introducing novel techniques.

The study evaluated the cross-domain performance of two POS taggers, Stanford and Bilty, trained on Wall Street Journal data, when applied to a Fandom Wiki dataset, finding that accuracy on unknown tokens dropped significantly, from 90.37% to 78.37% for Stanford and 87.84% to 80.41% for Bilty.

The Wall Street Journal section of the Penn Treebank has been the de-facto standard for evaluating POS taggers for a long time, and accuracies over 97\% have been reported. However, less is known about out-of-domain tagger performance, especially with fine-grained label sets. Using data from Elder Scrolls Fandom, a wiki about the \textit{Elder Scrolls} video game universe, we create a modest dataset for qualitatively evaluating the cross-domain performance of two POS taggers: the Stanford tagger (Toutanova et al. 2003) and Bilty (Plank et al. 2016), both trained on WSJ. Our analyses show that performance on tokens seen during training is almost as good as in-domain performance, but accuracy on unknown tokens decreases from 90.37% to 78.37% (Stanford) and 87.84\% to 80.41\% (Bilty) across domains. Both taggers struggle with proper nouns and inconsistent capitalization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes