ALoFTRAG: Automatic Local Fine Tuning for Retrieval Augmented Generation
It provides a practical, cost-effective solution for improving RAG accuracy in sensitive domains like healthcare and finance, though it is incremental as it builds on existing RAG and fine-tuning methods.
The paper tackles the problem of low accuracy in Retrieval Augmented Generation (RAG) systems when applied to new data domains by introducing the ALoFTRAG framework, which improves citation and answer accuracy by an average of 8.3% and 3.0% across 20 datasets in 26 languages.
Retrieval Augmented Generation (RAG) systems have been shown to improve the accuracy of Large Language Model (LLM) outputs. However, these models can often achieve low accuracy when applied to new data domains. We introduce the Automatic Local Fine Tuning of Retrieval Augmented Generation models (ALoFTRAG) framework, designed to improve the accuracy of RAG systems on a given domain by training LLMs without manually labeled data or using larger teacher models. By generating and filtering synthetic training data and performing LoRA fine-tuning, ALoFTRAG improves citation and answer accuracy across 20 datasets in 26 languages by, on average, 8.3% and 3.0% respectively. Our results demonstrate that ALoFTRAG offers a practical, cost-effective, and data-secure solution for improving RAG accuracy, making it particularly applicable to sensitive domains such as healthcare and finance.