CLNov 21, 2020

Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference

arXiv:2011.10819v1995 citations
AI Analysis

This work provides a more robust and automated method for evaluating the semantic accuracy of D2T generation systems, which is a critical problem for developers and researchers in natural language generation.

The paper addresses the challenge of evaluating semantic accuracy in data-to-text (D2T) generation by proposing a new metric based on a pretrained natural language inference (NLI) model. This metric checks textual entailment between input data (converted to text) and output text in both directions, effectively identifying omissions and hallucinations, and achieves high accuracy in identifying erroneous system outputs on two D2T datasets.

A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i.e. checking if the output text contains all and only facts supported by the input data. We propose a new metric for evaluating the semantic accuracy of D2T generation based on a neural model pretrained for natural language inference (NLI). We use the NLI model to check textual entailment between the input data and the output text in both directions, allowing us to reveal omissions or hallucinations. Input data are converted to text for NLI using trivial templates. Our experiments on two recent D2T datasets show that our metric can achieve high accuracy in identifying erroneous system outputs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes