CLDec 25, 2024

Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation

Xinkai Du, Quanjie Han, Chao Lv, Yan Liu, Yalin Sun, Hao Shu, Hongbo Shan, Maosong Sun

arXiv:2412.18800v11.0h-index: 3ICASSP

Originality Incremental advance

AI Analysis

This addresses a lack of definitive labels for pairing knowledge sources in QA, offering an incremental improvement for researchers and practitioners.

The paper tackles the problem of combining retrieved and generated knowledge in open-domain QA without labeled data, proposing an unsupervised framework that improves performance by +1.7 and +1.6 on NQ and WebQ datasets.

Open-domain Question Answering (QA) has garnered substantial interest by combining the advantages of faithfully retrieved passages and relevant passages generated through Large Language Models (LLMs). However, there is a lack of definitive labels available to pair these sources of knowledge. In order to address this issue, we propose an unsupervised and simple framework called Bi-Reranking for Merging Generated and Retrieved Knowledge (BRMGR), which utilizes re-ranking methods for both retrieved passages and LLM-generated passages. We pair the two types of passages using two separate re-ranking methods and then combine them through greedy matching. We demonstrate that BRMGR is equivalent to employing a bipartite matching loss when assigning each retrieved passage with a corresponding LLM-generated passage. The application of our model yielded experimental results from three datasets, improving their performance by +1.7 and +1.6 on NQ and WebQ datasets, respectively, and obtaining comparable result on TriviaQA dataset when compared to competitive baselines.

View on arXiv PDF

Similar