LGDec 5, 2024

Scalable Early Childhood Reading Performance Prediction

arXiv:2412.10401v14 citationsh-index: 31NIPS
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of identifying at-risk students for educators, but it is incremental as it primarily introduces a new dataset and applies existing methods.

The authors tackled the lack of public datasets for predicting early childhood reading performance by introducing the ECRI dataset, a large-scale longitudinal collection from 44 schools, and demonstrated that a simple self-supervised MLP outperforms baselines in recognizing educational patterns.

Models for student reading performance can empower educators and institutions to proactively identify at-risk students, thereby enabling early and tailored instructional interventions. However, there are no suitable publicly available educational datasets for modeling and predicting future reading performance. In this work, we introduce the Enhanced Core Reading Instruction ECRI dataset, a novel large-scale longitudinal tabular dataset collected across 44 schools with 6,916 students and 172 teachers. We leverage the dataset to empirically evaluate the ability of state-of-the-art machine learning models to recognize early childhood educational patterns in multivariate and partial measurements. Specifically, we demonstrate a simple self-supervised strategy in which a Multi-Layer Perception (MLP) network is pre-trained over masked inputs to outperform several strong baselines while generalizing over diverse educational settings. To facilitate future developments in precise modeling and responsible use of models for individualized and early intervention strategies, our data and code are available at https://ecri-data.github.io/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes