CLLGMLOct 4, 2019

Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents

arXiv:1910.03487v11014 citations
AI Analysis

This work addresses data scarcity for developers of intelligent artificial agents, though it is incremental as it builds on existing text generation techniques.

The paper tackles the problem of limited training data for developing new capabilities in intelligent artificial agents by using controlled text generation for data augmentation, resulting in up to 5% absolute f-score improvement in intent classification tasks in low-resource scenarios.

Data availability is a bottleneck during early stages of development of new capabilities for intelligent artificial agents. We investigate the use of text generation techniques to augment the training data of a popular commercial artificial agent across categories of functionality, with the goal of faster development of new functionality. We explore a variety of encoder-decoder generative models for synthetic training data generation and propose using conditional variational auto-encoders. Our approach requires only direct optimization, works well with limited data and significantly outperforms the previous controlled text generation techniques. Further, the generated data are used as additional training samples in an extrinsic intent classification task, leading to improved performance by up to 5\% absolute f-score in low-resource cases, validating the usefulness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes