CLJun 10, 2018

A Structured Variational Autoencoder for Contextual Morphological Inflection

Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, Ryan Cotterell

arXiv:1806.03746v232.01099 citations

Originality Incremental advance

AI Analysis

This addresses a challenge in natural language processing for low-resource languages by enhancing inflection generation, though it is incremental as it builds on existing variational autoencoder methods.

The paper tackles the problem of improving statistical morphological inflectors by exploiting raw, token-level data in a semi-supervised setting, achieving improvements of over 10% absolute accuracy in some cases across 23 languages.

Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the latent variables, we derive an efficient variational inference procedure based on the wake-sleep algorithm. We experiment on 23 languages, using the Universal Dependencies corpora in a simulated low-resource setting, and find improvements of over 10% absolute accuracy in some cases.

View on arXiv PDF

Similar