IR AI LGJun 17, 2025

RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition

Tim Cofala, Oleh Astappiev, William Xion, Hailay Teklehaymanot

arXiv:2506.14412v26.31 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This is an incremental contribution focused on optimizing RAG systems for a specific competition dataset to enhance accuracy in QA tasks.

The paper tackled the problem of evaluating retrieval-augmented generation (RAG) systems for improving factual correctness in question-answering, achieving a correctness score of 1.13 and faithfulness score of 0.55, placing third in the SIGIR 2025 LiveRAG Challenge.

Retrieval-Augmented Generation (RAG) enriches Large Language Models (LLMs) by combining their internal, parametric knowledge with external, non-parametric sources, with the goal of improving factual correctness and minimizing hallucinations. The LiveRAG 2025 challenge explores RAG solutions to maximize accuracy on DataMorgana's QA pairs, which are composed of single-hop and multi-hop questions. The challenge provides access to sparse OpenSearch and dense Pinecone indices of the Fineweb 10BT dataset. It restricts model use to LLMs with up to 10B parameters and final answer generation with Falcon-3-10B. A judge-LLM assesses the submitted answers along with human evaluators. By exploring distinct retriever combinations and RAG solutions under the challenge conditions, our final solution emerged using InstructRAG in combination with a Pinecone retriever and a BGE reranker. Our solution achieved a correctness score of 1.13 and a faithfulness score of 0.55 in the non-human evaluation, placing it overall in third place in the SIGIR 2025 LiveRAG Challenge.

View on arXiv PDF

Similar