CL IRFeb 10, 2025

Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection

Yan Weng, Fengbin Zhu, Tong Ye, Haoyan Liu, Fuli Feng, Tat-Seng Chua

arXiv:2502.06148v16.74 citationsh-index: 18Has Code

Originality Highly original

AI Analysis

This work addresses the problem of effective knowledge integration in Large Language Models for more accurate and reliable responses, which is significant for natural language processing applications.

The authors tackled the challenge of integrating external knowledge into Large Language Models, achieving enhanced accuracy with their proposed Self-Selection RAG framework, which outperformed baseline methods on Natural Questions and TrivialQA datasets. Experimental results demonstrated the superiority of their approach with two open-source LLMs.

Retrieval-Augmented Generation (RAG), which integrates external knowledge into Large Language Models (LLMs), has proven effective in enabling LLMs to produce more accurate and reliable responses. However, it remains a significant challenge how to effectively integrate external retrieved knowledge with internal parametric knowledge in LLMs. In this work, we propose a novel Self-Selection RAG framework, where the LLM is made to select from pairwise responses generated with internal parametric knowledge solely and with external retrieved knowledge together to achieve enhanced accuracy. To this end, we devise a Self-Selection-RGP method to enhance the capabilities of the LLM in both generating and selecting the correct answer, by training the LLM with Direct Preference Optimization (DPO) over a curated Retrieval Generation Preference (RGP) dataset. Experimental results with two open-source LLMs (i.e., Llama2-13B-Chat and Mistral-7B) well demonstrate the superiority of our approach over other baseline methods on Natural Questions (NQ) and TrivialQA datasets.

View on arXiv PDF

Similar