InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding
This addresses the challenge of making LLMs more faithful for complex real-world queries requiring evidence synthesis, though it is incremental as it builds on existing grounding approaches.
The paper tackles the problem of integrative grounding, where LLMs must retrieve and verify multiple interdependent pieces of evidence for complex queries, by evaluating methods on repurposed data from four domains. Key findings include LLMs' tendency to rationalize with incomplete information, the degradation from undirected planning, and the promise of premise abduction with logical constraints, with zero-shot self-reflection consistently improving grounding quality.
Grounding large language models (LLMs) in external knowledge sources is a promising method for faithful prediction. While existing grounding approaches work well for simple queries, many real-world information needs require synthesizing multiple pieces of evidence. We introduce "integrative grounding" -- the challenge of retrieving and verifying multiple inter-dependent pieces of evidence to support a hypothesis query. To systematically study this problem, we repurpose data from four domains for evaluating integrative grounding capabilities. Our investigation reveals two critical findings: First, in groundedness verification, while LLMs are robust to redundant evidence, they tend to rationalize using internal knowledge when information is incomplete. Second, in examining retrieval planning strategies, we find that undirected planning can degrade performance through noise introduction, while premise abduction emerges as a promising approach due to its logical constraints. Additionally, LLMs' zero-shot self-reflection capabilities consistently improve grounding quality. These insights provide valuable direction for developing more effective integrative grounding systems.