Fine-Grained Self-Endorsement Improves Factuality and Reasoning
This addresses the problem of factual inaccuracies in LLM outputs for users relying on generated content, though it is incremental as it builds on prior ensemble methods.
The paper tackles fact-conflicting hallucinations in large language model generations by proposing a self-endorsement framework that uses fine-grained fact-level comparisons across multiple responses, showing improvements in factuality on tasks like Biographies and potential for broader applications.
This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our approach can better alleviate hallucinations, especially for longform generation tasks. Our approach can broadly benefit smaller and open-source LLMs as it mainly conducts simple content-based comparisons. Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs. Besides, comprehensive analyses on TriviaQA and GSM8K demonstrate the potential of self-endorsement for broader application.