CLMay 17, 2017

Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models

arXiv:1705.06106v25.921 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of limited labeled data in natural language processing for morphological generation, offering a semi-supervised approach that is incremental but effective for specific languages.

The paper tackles the problem of morphological reinflection by using unlabeled data and multi-task training with character-based sequence-to-sequence models, achieving up to 9.9% improvement over state-of-the-art baselines across 8 languages.

We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use limited labeled data more effectively, obtaining up to 9.9% improvement over state-of-the-art baselines for 8 different languages.

View on arXiv PDF

Similar