A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

Ricardo Hidalgo-Aragón, Jesús M. González-Barahona, Gregorio Robles

arXiv:2604.0073018.0

AI Analysis

This provides a transparent, automated assessment tool for schools and training platforms to diagnose curriculum gaps and support personalized learning, though it is incremental as it builds on existing methods like Dr.Scratch and CEFR.

The study tackled the problem of assessing programming proficiency at scale in Scratch by introducing a CEFR-aligned framework using Fuzzy C-Means clustering on over 2 million projects, identifying a 'B2 bottleneck' where only 13.3% of learners reach higher competency levels due to cognitive load.

Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students and teachers alongside actionable insights for curriculum design. Method: We apply Fuzzy C-Means clustering to 2008246 Scratch projects evaluated via Dr.Scratch, implementing an ordinal criterion to map clusters to CEFR levels (A1-C2), and introducing enhanced classification metrics that identify transitional learners, enable continuous progress tracking, and quantify classification certainty to balance automated feedback with instructor review. Impact: The framework enables diagnosis of systemic curriculum gaps-notably a "B2 bottleneck" where only 13.3% of learners reside due to the cognitive load of integrating Logic Synchronization, and Data Representation--while providing certainty--based triggers for human intervention.

View on arXiv PDF

Similar