IRAISep 2, 2025

HF-RAG: Hierarchical Fusion-based RAG with Multiple Sources and Rankers

arXiv:2509.02837v12 citationsh-index: 24CIKM
Originality Incremental advance
AI Analysis

This work addresses the problem of effectively integrating heterogeneous data sources in RAG for fact verification, representing an incremental advancement.

The paper tackled the challenge of combining labeled and unlabeled data in retrieval augmented generation (RAG) by proposing a hierarchical fusion method that aggregates multiple rankers and standardizes scores, resulting in consistent improvements over individual rankers or sources and better out-of-domain generalization on fact verification tasks.

Leveraging both labeled (input-output associations) and unlabeled data (wider contextual grounding) may provide complementary benefits in retrieval augmented generation (RAG). However, effectively combining evidence from these heterogeneous sources is challenging as the respective similarity scores are not inter-comparable. Additionally, aggregating beliefs from the outputs of multiple rankers can improve the effectiveness of RAG. Our proposed method first aggregates the top-documents from a number of IR models using a standard rank fusion technique for each source (labeled and unlabeled). Next, we standardize the retrieval score distributions within each source by applying z-score transformation before merging the top-retrieved documents from the two sources. We evaluate our approach on the fact verification task, demonstrating that it consistently improves over the best-performing individual ranker or source and also shows better out-of-domain generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes