CYAILGAug 19, 2022

MonaCoBERT: Monotonic attention based ConvBERT for Knowledge Tracing

arXiv:2208.12615v212 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving both accuracy and explainability in educational AI systems for predicting student performance, representing an incremental advancement over prior models.

The authors tackled the problem of balancing performance and interpretability in knowledge tracing by proposing MonaCoBERT, which achieved state-of-the-art results on most benchmark datasets while providing significant interpretability through techniques like monotonic attention and CTT-based embedding.

Knowledge tracing (KT) is a field of study that predicts the future performance of students based on prior performance datasets collected from educational applications such as intelligent tutoring systems, learning management systems, and online courses. Some previous studies on KT have concentrated only on the interpretability of the model, whereas others have focused on enhancing the performance. Models that consider both interpretability and the performance improvement have been insufficient. Moreover, models that focus on performance improvements have not shown an overwhelming performance compared with existing models. In this study, we propose MonaCoBERT, which achieves the best performance on most benchmark datasets and has significant interpretability. MonaCoBERT uses a BERT-based architecture with monotonic convolutional multihead attention, which reflects forgetting behavior of the students and increases the representation power of the model. We can also increase the performance and interpretability using a classical test-theory-based (CTT-based) embedding strategy that considers the difficulty of the question. To determine why MonaCoBERT achieved the best performance and interpret the results quantitatively, we conducted ablation studies and additional analyses using Grad-CAM, UMAP, and various visualization techniques. The analysis results demonstrate that both attention components complement one another and that CTT-based embedding represents information on both global and local difficulties. We also demonstrate that our model represents the relationship between concepts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes