CLDec 12, 2024

KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education?

arXiv:2412.08985v42 citationsh-index: 17ACL
Originality Synthesis-oriented
AI Analysis

This addresses robustness issues for RAG systems used as question-answering tools in K-12 education, but it is incremental as it focuses on evaluating existing methods rather than proposing new ones.

The paper tackles the problem of Retrieval-Augmented Generation (RAG) systems being undermined by discrepancies between textbook knowledge and LLM parametric knowledge in K-12 education, finding that most systems suffer a substantial performance drop when faced with simulated knowledge shifts in a dataset of 3,005 questions.

Retrieval-Augmented Generation (RAG) systems show remarkable potential as question answering tools in the K-12 Education domain, where knowledge is typically queried within the restricted scope of authoritative textbooks. However, discrepancies between these textbooks and the parametric knowledge inherent in Large Language Models (LLMs) can undermine the effectiveness of RAG systems. To systematically investigate RAG system robustness against such knowledge discrepancies, we introduce KnowShiftQA. This novel question answering dataset simulates these discrepancies by applying deliberate hypothetical knowledge updates to both answers and source documents, reflecting how textbook knowledge can shift. KnowShiftQA comprises 3,005 questions across five subjects, designed with a comprehensive question typology focusing on context utilization and knowledge integration. Our extensive experiments on retrieval and question answering performance reveal that most RAG systems suffer a substantial performance drop when faced with these knowledge discrepancies. Furthermore, questions requiring the integration of contextual (textbook) knowledge with parametric (LLM) knowledge pose a significant challenge to current LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes