Jointly Generating and Attributing Answers using Logits of Document-Identifier Tokens
This addresses the trustworthiness issue in LLMs for users relying on accurate and attributed outputs, though it is incremental as it builds on prior work on faithfulness.
The paper tackles the problem of hallucination in Large Language Models by introducing LoDIT, a method that jointly generates and faithfully attributes answers in Retrieval-Augmented Generation, resulting in significant outperformance over state-of-the-art models on the Trust-Align benchmark.
Despite their impressive performances, Large Language Models (LLMs) remain prone to hallucination, which critically undermines their trustworthiness. While most of the previous work focused on tackling answer and attribution correctness, a recent line of work investigated faithfulness, with a focus on leveraging internal model signals to reflect a model's actual decision-making process while generating the answer. Nevertheless, these methods induce additional latency and have shown limitations in directly aligning token generation with attribution generation. In this paper, we introduce LoDIT, a method that jointly generates and faithfully attributes answers in RAG by leveraging specific token logits during generation. It consists of two steps: (1) marking the documents with specific token identifiers and then leveraging the logits of these tokens to estimate the contribution of each document to the answer during generation, and (2) aggregating these contributions into document attributions. Experiments on a trustworthiness-focused attributed text-generation benchmark, Trust-Align, show that LoDIT significantly outperforms state-of-the-art models on several metrics. Finally, an in-depth analysis of LoDIT shows both its efficiency in terms of latency and its robustness in different settings.