CLMar 12, 2019

Character Eyes: Seeing Language through Character-Level Taggers

arXiv:1903.05041v11091 citations
Originality Synthesis-oriented
AI Analysis

This work provides insights into language-specific challenges for NLP practitioners, but it is incremental as it builds on existing character-level tagging methods.

The paper investigated how character-level LSTM hidden units behave in part-of-speech taggers across languages with different morphological properties, linking synthesis and affixation to tagger challenges and showing that modifying forward-backward unit balance affects model performance.

Character-level models have been used extensively in recent years in NLP tasks as both supplements and replacements for closed-vocabulary token-level word representations. In one popular architecture, character-level LSTMs are used to feed token representations into a sequence tagger predicting token-level annotations such as part-of-speech (POS) tags. In this work, we examine the behavior of POS taggers across languages from the perspective of individual hidden units within the character LSTM. We aggregate the behavior of these units into language-level metrics which quantify the challenges that taggers face on languages with different morphological properties, and identify links between synthesis and affixation preference and emergent behavior of the hidden tagger layer. In a comparative experiment, we show how modifying the balance between forward and backward hidden units affects model arrangement and performance in these types of languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes