CLNov 28, 2024

Efficient Learning Content Retrieval with Knowledge Injection

arXiv:2412.00125v1h-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the challenge for learners navigating abundant educational resources in domains like ICT, though it is incremental as it combines existing methods like fine-tuning and RAG.

The study tackled the problem of efficiently retrieving educational content in online platforms by developing a domain-specific chatbot using fine-tuned Phi language models and a RAG system, achieving a precision of 0.84 and F1 score of 0.82 for the Phi-2 model with RAG.

With the rise of online education platforms, there is a growing abundance of educational content across various domain. It can be difficult to navigate the numerous available resources to find the most suitable training, especially in domains that include many interconnected areas, such as ICT. In this study, we propose a domain-specific chatbot application that requires limited resources, utilizing versions of the Phi language model to help learners with educational content. In the proposed method, Phi-2 and Phi-3 models were fine-tuned using QLoRA. The data required for fine-tuning was obtained from the Huawei Talent Platform, where courses are available at different levels of expertise in the field of computer science. RAG system was used to support the model, which was fine-tuned by 500 Q&A pairs. Additionally, a total of 420 Q&A pairs of content were extracted from different formats such as JSON, PPT, and DOC to create a vector database to be used in the RAG system. By using the fine-tuned model and RAG approach together, chatbots with different competencies were obtained. The questions and answers asked to the generated chatbots were saved separately and evaluated using ROUGE, BERTScore, METEOR, and BLEU metrics. The precision value of the Phi-2 model supported by RAG was 0.84 and the F1 score was 0.82. In addition to a total of 13 different evaluation metrics in 4 different categories, the answers of each model were compared with the created content and the most appropriate method was selected for real-life applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes