CLMay 17, 2017

Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models

arXiv:1705.06106v221 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited labeled data in natural language processing for morphological generation, offering a semi-supervised approach that is incremental but effective for specific languages.

The paper tackles the problem of morphological reinflection by using unlabeled data and multi-task training with character-based sequence-to-sequence models, achieving up to 9.9% improvement over state-of-the-art baselines across 8 languages.

We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use limited labeled data more effectively, obtaining up to 9.9% improvement over state-of-the-art baselines for 8 different languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes