CLJun 10, 2018

A Structured Variational Autoencoder for Contextual Morphological Inflection

arXiv:1806.03746v21099 citations
Originality Incremental advance
AI Analysis

This addresses a challenge in natural language processing for low-resource languages by enhancing inflection generation, though it is incremental as it builds on existing variational autoencoder methods.

The paper tackles the problem of improving statistical morphological inflectors by exploiting raw, token-level data in a semi-supervised setting, achieving improvements of over 10% absolute accuracy in some cases across 23 languages.

Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the latent variables, we derive an efficient variational inference procedure based on the wake-sleep algorithm. We experiment on 23 languages, using the Universal Dependencies corpora in a simulated low-resource setting, and find improvements of over 10% absolute accuracy in some cases.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes