AISep 9, 2025

Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study

arXiv:2509.07846v11 citationsh-index: 1TALE

Originality Incremental advance

AI Analysis

This addresses the problem of unreliable LLM-generated information in educational settings by providing practical RAG guidelines for educators and system designers.

The study compared vector-based and graph-based Retrieval Augmented Generation (RAG) methods for classroom question answering, finding that OpenAI Vector Search RAG works well as a low-cost generalist for fact retrieval, while GraphRAG variants excel at thematic queries and handling altered textbooks, with a dynamic routing framework improving fidelity and efficiency.

Large language models like ChatGPT are increasingly used in classrooms, but they often provide outdated or fabricated information that can mislead students. Retrieval Augmented Generation (RAG) improves reliability of LLMs by grounding responses in external resources. We investigate two accessible RAG paradigms, vector-based retrieval and graph-based retrieval to identify best practices for classroom question answering (QA). Existing comparative studies fail to account for pedagogical factors such as educational disciplines, question types, and practical deployment costs. Using a novel dataset, EduScopeQA, of 3,176 questions across academic subjects, we measure performance on various educational query types, from specific facts to broad thematic discussions. We also evaluate system alignment with a dataset of systematically altered textbooks that contradict the LLM's latent knowledge. We find that OpenAI Vector Search RAG (representing vector-based RAG) performs well as a low-cost generalist, especially for quick fact retrieval. On the other hand, GraphRAG Global excels at providing pedagogically rich answers to thematic queries, and GraphRAG Local achieves the highest accuracy with the dense, altered textbooks when corpus integrity is critical. Accounting for the 10-20x higher resource usage of GraphRAG (representing graph-based RAG), we show that a dynamic branching framework that routes queries to the optimal retrieval method boosts fidelity and efficiency. These insights provide actionable guidelines for educators and system designers to integrate RAG-augmented LLMs into learning environments effectively.

View on arXiv PDF

Similar