CLLGNov 4, 2018

Char2char Generation with Reranking for the E2E NLG Challenge

arXiv:1811.05826v11104 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of rare word handling in NLG for researchers and practitioners, offering a simplified approach with incremental improvements.

The paper tackles the problem of handling rare words in neural natural language generation by training a character-level seq2seq model that eliminates pre- and post-processing steps, achieving surprisingly good results, and further improves performance through re-ranking approaches and a synthetic dataset creation method.

This paper describes our submission to the E2E NLG Challenge. Recently, neural seq2seq approaches have become mainstream in NLG, often resorting to pre- (respectively post-) processing delexicalization (relexicalization) steps at the word-level to handle rare words. By contrast, we train a simple character level seq2seq model, which requires no pre/post-processing (delexicalization, tokenization or even lowercasing), with surprisingly good results. For further improvement, we explore two re-ranking approaches for scoring candidates. We also introduce a synthetic dataset creation procedure, which opens up a new way of creating artificial datasets for Natural Language Generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes