LGNov 7, 2024

Enhancing classroom teaching with LLMs and RAG

Elizabeth A Mullins, Adrian Portillo, Kristalys Ruiz-Rohena, Aritran Piplai

arXiv:2411.04341v14.67 citationsh-index: 15SIGITE

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of outdated information in LLMs for educational applications, but it is incremental as it primarily evaluates data source effectiveness rather than achieving high performance.

The study investigated using Retrieval-Augmented Generation (RAG) pipelines with course materials to aid K-12 students, but initial tests with Reddit as a data source for cybersecurity information showed average answer correctness below 50%, indicating it is not a good source for such questions.

Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course materials serving as a data source, might help students in K-12 education. The initial research utilizes Reddit as a data source for up-to-date cybersecurity information. Chunk size is evaluated to determine the optimal amount of context needed to generate accurate answers. After running the experiment for different chunk sizes, answer correctness was evaluated using RAGAs with average answer correctness not exceeding 50 percent for any chunk size. This suggests that Reddit is not a good source to mine for data for questions about cybersecurity threats. The methodology was successful in evaluating the data source, which has implications for its use to evaluate educational resources for effectiveness.

View on arXiv PDF

Similar