CLJan 12, 2020

Revisiting Challenges in Data-to-Text Generation with Fact Grounding

arXiv:2001.03830v11012 citations
AI Analysis

This work addresses data fidelity challenges for researchers in data-to-text generation, but it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of factual hallucinations in data-to-text generation by addressing information deficiency in the RotoWire corpus, where only about 60% of summary contents are grounded to input data, and introduces a purified dataset, RotoWire-FG, with 50% more data and enriched tables, achieving improved data fidelity over state-of-the-art models.

Data-to-text generation models face challenges in ensuring data fidelity by referring to the correct input source. To inspire studies in this area, Wiseman et al. (2017) introduced the RotoWire corpus on generating NBA game summaries from the box- and line-score tables. However, limited attempts have been made in this direction and the challenges remain. We observe a prominent bottleneck in the corpus where only about 60% of the summary contents can be grounded to the boxscore records. Such information deficiency tends to misguide a conditioned language model to produce unconditioned random facts and thus leads to factual hallucinations. In this work, we restore the information balance and revamp this task to focus on fact-grounded data-to-text generation. We introduce a purified and larger-scale dataset, RotoWire-FG (Fact-Grounding), with 50% more data from the year 2017-19 and enriched input tables, hoping to attract more research focuses in this direction. Moreover, we achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction as an auxiliary task to boost the generation quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes