CLOct 19, 2019

Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation

Ran Tian, Shashi Narayan, Thibault Sellam, Ankur P. Parikh

arXiv:1910.08684v38.4102 citations

Originality Incremental advance

AI Analysis

This addresses the problem of generating unsupported text in data-to-text systems, which is incremental but important for applications requiring accurate information extraction.

The paper tackles hallucination in data-to-text generation by proposing a confidence score and variational Bayes training to ensure the model attends to the source, resulting in improved faithfulness over state-of-the-art approaches on the WikiBio dataset as measured by PARENT score and human evaluation.

We address the issue of hallucination in data-to-text generation, i.e., reducing the generation of text that is unsupported by the source. We conjecture that hallucination can be caused by an encoder-decoder model generating content phrases without attending to the source; so we propose a confidence score to ensure that the model attends to the source whenever necessary, as well as a variational Bayes training framework that can learn the score from data. Experiments on the WikiBio (Lebretet al., 2016) dataset show that our approach is more faithful to the source than existing state-of-the-art approaches, according to both PARENT score (Dhingra et al., 2019) and human evaluation. We also report strong results on the WebNLG (Gardent et al., 2017) dataset.

View on arXiv PDF

Similar