CLAIOct 31, 2024

LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models

arXiv:2410.23526v12 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the issue of factual correctness in LLMs for knowledge-intensive domains like healthcare, but it appears incremental as it builds on existing methods like RAG and fine-tuning.

The paper tackles the problem of factual inaccuracies in large language models, particularly in medical question answering, by introducing LEAF, which uses fact-checking to enhance retrieval-augmented generation and self-training, resulting in improved factual reliability without specifying concrete numbers.

Large language models (LLMs) have shown remarkable capabilities in various natural language processing tasks, yet they often struggle with maintaining factual accuracy, particularly in knowledge-intensive domains like healthcare. This study introduces LEAF: Learning and Evaluation Augmented by Fact-Checking, a novel approach designed to enhance the factual reliability of LLMs, with a focus on medical question answering (QA). LEAF utilizes a dual strategy to enhance the factual accuracy of responses from models such as Llama 3 70B Instruct and Llama 3 8B Instruct. The first strategy, Fact-Check-Then-RAG, improves Retrieval-Augmented Generation (RAG) by incorporating fact-checking results to guide the retrieval process without updating model parameters. The second strategy, Learning from Fact-Checks via Self-Training, involves supervised fine-tuning (SFT) on fact-checked responses or applying Simple Preference Optimization (SimPO) with fact-checking as a ranking mechanism, both updating LLM parameters from supervision. These findings suggest that integrating fact-checked responses whether through RAG enhancement or self-training enhances the reliability and factual correctness of LLM outputs, offering a promising solution for applications where information accuracy is crucial.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes