CYLGMLJan 30, 2024

Analysis of Knowledge Tracing performance on synthesised student data

arXiv:2401.16832v1h-index: 6
Originality Synthesis-oriented
AI Analysis

This work addresses data scarcity and quality issues in educational AI for researchers and practitioners, but it is incremental as it builds on existing methods without major breakthroughs.

The paper tackled the problem of limited and noisy real student data for Knowledge Tracing by simulating student data using three statistical strategies, finding that training with only synthetic data can achieve similar performance to real data, with only minor improvements from additional synthetic data.

Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises in benchmark datasets such as duplicate records. To resolve these problems, we simulated student data with three statistical strategies based on public datasets and tested their performance on two KT baselines. While we observe only minor performance improvement with additional synthetic data, our work shows that using only synthetic data for training can lead to similar performance as real data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes