Data-to-Text Generation with Iterative Text Editing
This addresses the problem of generating fluent and accurate text from structured data for NLP applications, but it is incremental as it builds on existing pre-trained models.
The paper tackles data-to-text generation by using iterative text editing to improve output completeness and semantic accuracy, achieving competitive results on WebNLG and Cleaned E2E datasets and enabling zero-shot domain adaptation.
We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task. The output of the model is filtered by a simple heuristic and reranked with an off-the-shelf pre-trained language model. We evaluate our approach on two major data-to-text datasets (WebNLG, Cleaned E2E) and analyze its caveats and benefits. Furthermore, we show that our formulation of data-to-text generation opens up the possibility for zero-shot domain adaptation using a general-domain dataset for sentence fusion.