Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational Use
This provides a cost-effective solution for implementing LLM-based tutoring systems in educational settings, though it is incremental.
The study tackled the challenge of using large language models (LLMs) for dialog-based tutoring by developing a synthetic tutoring dialog dataset and fine-tuning a smaller LLM, resulting in performance on par with a larger model at lower cost.
Recent advances in large language models (LLMs) have shown promise for scalable educational applications, but their use in dialog-based tutoring systems remains challenging due to the need for effective pedagogical strategies and the high costs associated with expert-curated datasets. Our study explores the use of smaller, more affordable LLMs for one-on-one tutoring in the context of solving reading comprehension problems. We developed a synthetic tutoring dialog dataset, evaluated by human teachers, and fine-tuned a smaller LLM using this dataset. Furthermore, we conducted an interactive experiment comparing the performance of the fine-tuned model with a larger model in real-world tutoring scenarios. Our results show that the fine-tuned model performs on par with the larger model but at a lower cost, demonstrating a viable, cost-effective approach for implementing LLM-based tutoring systems in educational settings.