CLAIJan 12

From RAG to Agentic RAG for Faithful Islamic Question Answering

arXiv:2601.07528v13 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses the need for faithful Islamic QA to prevent religious consequences, though it is incremental as it builds on existing RAG methods.

The paper tackled the problem of ungrounded responses in Islamic question answering by introducing ISLAMICFAITHQA, a bilingual benchmark for measuring hallucination and abstention, and developed an agentic RAG framework that achieved state-of-the-art performance with a small model like Qwen3 4B.

LLMs are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real-world failure modes, notably free-form hallucinations and whether models appropriately abstain when evidence is lacking. To shed a light on this aspect we introduce ISLAMICFAITHQA, a 3,810-item bilingual (Arabic/English) generative benchmark with atomic single-gold answers, which enables direct measurement of hallucination and abstention. We additionally developed an end-to-end grounded Islamic modelling suite consisting of (i) 25K Arabic text-grounded SFT reasoning pairs, (ii) 5K bilingual preference samples for reward-guided alignment, and (iii) a verse-level Qur'an retrieval corpus of $\sim$6k atomic verses (ayat). Building on these resources, we develop an agentic Quran-grounding framework (agentic RAG) that uses structured tool calls for iterative evidence seeking and answer revision. Experiments across Arabic-centric and multilingual LLMs show that retrieval improves correctness and that agentic RAG yields the largest gains beyond standard RAG, achieving state-of-the-art performance and stronger Arabic-English robustness even with a small model (i.e., Qwen3 4B). We will make the experimental resources and datasets publicly available for the community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes