CLAug 9, 2015

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

arXiv:1508.02096v2648 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of open vocabulary word representation for natural language processing, especially in languages with complex morphology, though it is incremental as it builds on existing compositional and LSTM methods.

The authors tackled the problem of constructing word representations by composing characters using bidirectional LSTMs, achieving state-of-the-art results in language modeling and part-of-speech tagging, with particularly strong benefits in morphologically rich languages like Turkish.

We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes