Fast and Accurate Factual Inconsistency Detection Over Long Documents
This addresses the challenge of hallucinations in generative AI for long documents, which is critical for applications like dialogue systems, though it appears incremental as it builds on existing NLI-based methods with a novel chunking strategy.
The paper tackles the problem of factual inconsistency detection in long documents generated by AI, introducing SCALE, a model that achieves state-of-the-art performance on diverse tasks and long inputs, as evidenced by evaluations on standard benchmarks and a new dataset.
Generative AI models exhibit remarkable potential; however, hallucinations across various tasks present a significant challenge, particularly for longer inputs that current approaches struggle to address effectively. We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy. Specifically, SCALE is a Natural Language Inference (NLI) based model that uses large text chunks to condition over long texts. This approach achieves state-of-the-art performance in factual inconsistency detection for diverse tasks and long inputs. Additionally, we leverage the chunking mechanism and employ a novel algorithm to explain SCALE's decisions through relevant source sentence retrieval. Our evaluations reveal that SCALE outperforms existing methods on both standard benchmarks and a new long-form dialogue dataset ScreenEval we constructed. Moreover, SCALE surpasses competitive systems in efficiency and model explanation evaluations. We have released our code and data publicly to GitHub.