CL IRMay 19, 2025

SAFE: Improving LLM Systems using Sentence-Level In-generation Attribution

João Eduardo Batista, Emil Vatai, Mohamed Wahib

arXiv:2505.12621v24.91 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the need for trustworthy, verifiable LLM outputs in scientific and high-stakes settings, though it is incremental as it builds on existing RAG systems.

The paper tackles the problem of unreliable source attribution in LLM-generated outputs by proposing SAFE, a sentence-level attribution framework for RAG systems, which achieved 95% accuracy in predicting required references and improved attribution accuracy by 2.1-6.0% in clean datasets.

Large Language Models (LLMs) are increasingly applied in various science domains, yet their broader adoption remains constrained by a critical challenge: the lack of trustworthy, verifiable outputs. Current LLMs often generate answers without reliable source attribution, or worse, with incorrect attributions, posing a barrier to their use in scientific and high-stakes settings, where traceability and accountability are paramount. To be reliable, attribution systems require high accuracy for short-length attribution on retrieved data, i.e., attribution to a sentence within a document rather than the entire document. We propose SAFE, a Sentence-level A ttribution FramEwork for Retrieve-Augmented Generation (RAG) systems that attributes generated sentences during generation. This allows users to verify sentences as they read them and correct the model when the attribution indicates the generated text is not grounded in the documents, increasing the safety of LLM systems. This framework consists of two steps: predicting the required number of references for a sentence, and attributing the sentence. Our approach achieved 95% accuracy in the first step, which translated to 2.1\~6.0% improvements in the accuracy (normalized for maximum possible accuracy) of all attribution algorithms in our clean dataset, when compared to their top-1 accuracy. We also applied SAFE in real-world scenarios with documents containing hundreds to thousands of sentences. In these settings, SAFE reliably attributed sentences to their source documents, demonstrating that the method generalizes beyond controlled benchmarks. The SAFE framework and the training dataset are publicly available on GitHub.

View on arXiv PDF

Similar