CLOct 9, 2019

Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity

arXiv:1910.04196v16 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of scaling dialogue systems for developers, but it is incremental as it builds on existing self-training methods with diversity optimization.

The paper tackles the challenge of efficiently expanding functionalities in task-oriented dialogue systems by proposing functionality-specific semi-supervised learning via self-training, which reduces training data by up to 50% with minimal performance impact.

Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment training data automatically from unlabeled data sets in a functionality-targeted manner. In addition, we examine multiple techniques for efficient selection of augmented utterances to reduce training time and increase diversity. First, we consider paraphrase detection methods that attempt to find utterance variants of labeled training data with good coverage. Second, we explore sub-modular optimization based on n-grams features for utterance selection. Experiments show that functionality-specific self-training is very effective for improving system performance. In addition, methods optimizing diversity can reduce training data in many cases to 50% with little impact on performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes