CL IRApr 19, 2025

Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park

arXiv:2504.14175v23 citationsh-index: 3ACL

Originality Synthesis-oriented

AI Analysis

This work addresses potential benchmark contamination issues for researchers in information retrieval and fact verification, highlighting an incremental concern about evaluation validity.

The paper investigates whether performance gains in LLM-based query expansion for retrieval tasks are due to knowledge leakage in benchmarks, using fact verification as a testbed, and finds that improvements correlate with generated documents containing sentences entailed by ground-truth evidence.

Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed performance gains. Using fact verification as a testbed, we analyze whether the generated documents contain information entailed by ground-truth evidence and assess their impact on performance. Our findings indicate that, on average, performance improvements consistently occurred for claims whose generated documents included sentences entailed by gold evidence. This suggests that knowledge leakage may be present in fact-verification benchmarks, potentially inflating the perceived performance of LLM-based query expansion methods.

View on arXiv PDF

Similar