CLAIOct 11, 2021

Rome was built in 1776: A Case Study on Factual Correctness in Knowledge-Grounded Response Generation

arXiv:2110.05456v240 citations
Originality Incremental advance
AI Analysis

This addresses the issue of factual errors in AI-generated responses for applications like chatbots and information retrieval, though it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of factual correctness in knowledge-grounded neural response generation models by introducing a human annotation setup to categorize responses and creating the Conv-FEVER dataset for training factual consistency detectors, showing that models trained on this data perform reasonably well in detecting factually inconsistent responses.

Recently neural response generation models have leveraged large pre-trained transformer models and knowledge snippets to generate relevant and informative responses. However, this does not guarantee that generated responses are factually correct. In this paper, we examine factual correctness in knowledge-grounded neural response generation models. We present a human annotation setup to identify three different response types: responses that are factually consistent with respect to the input knowledge, responses that contain hallucinated knowledge, and non-verifiable chitchat style responses. We use this setup to annotate responses generated using different stateof-the-art models, knowledge snippets, and decoding strategies. In addition, to facilitate the development of a factual consistency detector, we automatically create a new corpus called Conv-FEVER that is adapted from the Wizard of Wikipedia dataset and includes factually consistent and inconsistent responses. We demonstrate the benefit of our Conv-FEVER dataset by showing that the models trained on this data perform reasonably well to detect factually inconsistent responses with respect to the provided knowledge through evaluation on our human annotated data. We will release the Conv-FEVER dataset and the human annotated responses.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes