CLNov 6, 2018

Effective Subword Segmentation for Text Comprehension

arXiv:1811.02364v230 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific issue in natural language processing for improving text understanding models, but it is incremental as it builds on existing subword methods.

The paper tackles the problem of representing rare or complex words in text comprehension by proposing a subword-augmented embedding framework, which significantly improves baseline performance on English and Chinese benchmarks.

Representation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally-derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in various types of text understanding tasks on both English and Chinese benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes