CLApr 30, 2020

Control, Generate, Augment: A Scalable Framework for Multi-Attribute Text Generation

arXiv:2004.14983v21003 citations
AI Analysis

This work addresses the need for scalable multi-attribute text generation, particularly for data augmentation in NLP, but it is incremental as it builds on existing VAE and adversarial methods.

The paper tackles the problem of generating text with control over multiple semantic and syntactic attributes, resulting in a model that produces high-quality, diverse sentences and significantly improves performance in a downstream NLP task, often matching the gains from adding real data.

We introduce CGA, a conditional VAE architecture, to control, generate, and augment text. CGA is able to generate natural English sentences controlling multiple semantic and syntactic attributes by combining adversarial learning with a context-aware loss and a cyclical word dropout routine. We demonstrate the value of the individual model components in an ablation study. The scalability of our approach is ensured through a single discriminator, independently of the number of attributes. We show high quality, diversity and attribute control in the generated sentences through a series of automatic and human assessments. As the main application of our work, we test the potential of this new NLG model in a data augmentation scenario. In a downstream NLP task, the sentences generated by our CGA model show significant improvements over a strong baseline, and a classification performance often comparable to adding same amount of additional real data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes