CLMay 30, 2019

Assessing The Factual Accuracy of Generated Text

arXiv:1905.13322v2217 citations
Originality Incremental advance
AI Analysis

This addresses the need for better factual assessment in text generation, particularly for summarization, but is incremental as it builds on existing relation extraction methods.

The paper tackles the problem of evaluating factual accuracy in generated text by proposing a model-based metric, and shows it outperforms traditional metrics like ROUGE through human evaluation on a Wikipedia summarization task.

We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then analyse multiple models that estimate factual accuracy on a Wikipedia text summarization task, and show their efficacy compared to ROUGE and other model-free variants by conducting a human evaluation study.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes