CLMar 12, 2019

Character Eyes: Seeing Language through Character-Level Taggers

Yuval Pinter, Marc Marone, Jacob Eisenstein

arXiv:1903.05041v131.01091 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work provides insights into language-specific challenges for NLP practitioners, but it is incremental as it builds on existing character-level tagging methods.

The paper investigated how character-level LSTM hidden units behave in part-of-speech taggers across languages with different morphological properties, linking synthesis and affixation to tagger challenges and showing that modifying forward-backward unit balance affects model performance.

Character-level models have been used extensively in recent years in NLP tasks as both supplements and replacements for closed-vocabulary token-level word representations. In one popular architecture, character-level LSTMs are used to feed token representations into a sequence tagger predicting token-level annotations such as part-of-speech (POS) tags. In this work, we examine the behavior of POS taggers across languages from the perspective of individual hidden units within the character LSTM. We aggregate the behavior of these units into language-level metrics which quantify the challenges that taggers face on languages with different morphological properties, and identify links between synthesis and affixation preference and emergent behavior of the hidden tagger layer. In a comparative experiment, we show how modifying the balance between forward and backward hidden units affects model arrangement and performance in these types of languages.

View on arXiv PDF Code

Similar