CLAIDec 26, 2023

Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments

arXiv:2312.15842v39 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This research makes advanced AI scoring accessible in resource-constrained educational settings, though it is incremental as it applies an existing distillation method to a specific domain.

This study tackled the problem of deploying large language models for automatic scoring in science education by distilling a fine-tuned LLM into a smaller neural network, achieving 3% and 2% higher accuracy than ANN and TinyBERT models while being 4,000 times smaller and 10 times faster.

This study proposes a method for knowledge distillation (KD) of fine-tuned Large Language Models (LLMs) into smaller, more efficient, and accurate neural networks. We specifically target the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model (Neural Network) using the prediction probabilities (as soft labels) of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To validate the performance of the KD approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three mathematical reasoning datasets with student-written responses graded by human experts. We compared accuracy with state-of-the-art (SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models. Results have shown that the KD approach has 3% and 2% higher scoring accuracy than ANN and TinyBERT, respectively, and comparable accuracy to the teacher model. Furthermore, the student model size is 0.03M, 4,000 times smaller in parameters and x10 faster in inferencing than the teacher model and TinyBERT, respectively. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes