CLOct 21, 2020

PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

arXiv:2010.10866v2993 citations
Originality Incremental advance
AI Analysis

This work addresses dataset divergence issues in language generation for structured data, offering an incremental improvement over previous reinforcement learning approaches.

The paper tackles the problem of hallucinations and omissions in data-to-text generation by proposing a model-agnostic reinforcement learning framework based on the PARENT metric, which reduces these errors effectively on WikiBIO and WebNLG benchmarks compared to state-of-the-art models.

In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build ontop of previous Reinforcement Learning based approaches and show that a model-agnostic framework relying on the recently introduced PARENT metric is efficient at reducing both hallucinations and omissions. Evaluations on the widely used WikiBIO and WebNLG benchmarks demonstrate the effectiveness of this framework compared to state-of-the-art models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes