CLAug 25, 2020

ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

arXiv:2008.10875v310 citations
AI Analysis

This work addresses the challenge of generating topic-specific text for languages or domains with limited labeled data, offering a more accessible solution compared to resource-intensive methods.

The paper tackled the problem of topic-conditioned natural language generation in low-resource settings, where existing methods like PPLMs require large labeled datasets, and presented ETC-NLG, an unsupervised approach that leverages topic modeling annotations to generate text for emergent topics in unlabeled collections, achieving effective conditioning as evaluated in Italian and English with a parallel corpus.

Plug-and-play language models (PPLMs) enable topic-conditioned natural language generation by pairing large pre-trained generators with attribute models used to steer the predicted token distribution towards the selected topic. Despite their computational efficiency, PPLMs require large amounts of labeled texts to effectively balance generation fluency and proper conditioning, making them unsuitable for low-resource settings. We present ETC-NLG, an approach leveraging topic modeling annotations to enable fully-unsupervised End-to-end Topic-Conditioned Natural Language Generation over emergent topics in unlabeled document collections. We first test the effectiveness of our approach in a low-resource setting for Italian, evaluating the conditioning for both topic models and gold annotations. We then perform a comparative evaluation of ETC-NLG for Italian and English using a parallel corpus. Finally, we propose an automatic approach to estimate the effectiveness of conditioning on the generated utterances.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes