CLOct 21, 2025

DeBERTa-KC: A Transformer-Based Classifier for Knowledge Construction in Online Learning Discourse

arXiv:2510.19858v12.7h-index: 8

Originality Synthesis-oriented

AI Analysis

It addresses the need for scalable, theory-informed tools to assess epistemic engagement in informal digital learning environments, though it is incremental in applying existing methods to a new domain.

This study tackled the problem of automatically classifying knowledge construction levels in online science learning discourse by developing DeBERTa-KC, a transformer-based model, which achieved a macro-F1 score of 0.836 ± 0.008, significantly outperforming baselines.

This study presents DeBERTa-KC, a transformer-based model for automatic classification of knowledge construction (KC) levels in online science learning discourse. Using comments collected from four popular YouTube science channels (2022--2024), a balanced corpus of 20,000 manually annotated samples was created across four KC categories: \textit{nonKC}, \textit{Share}, \textit{Explore}, and \textit{Negotiate}. The proposed model extends DeBERTa-v3 with Focal Loss, Label Smoothing, and R-Drop regularization to address class imbalance and enhance generalization. A reproducible end-to-end pipeline was implemented, encompassing data extraction, annotation, preprocessing, training, and evaluation. Across 10-fold stratified cross-validation, DeBERTa-KC achieved a macro-F1 of $0.836 \pm 0.008$, significantly out-performing both classical and transformer baselines ($p<0.01$). Per-category results indicate strong sensitivity to higher-order epistemic engagement, particularly in \textit{Explore} and \textit{Negotiate} discourse. These findings demonstrate that large language models can effectively capture nuanced indicators of knowledge construction in informal digital learning environments, offering scalable, theory-informed approaches to discourse analysis and the development of automated tools for assessing epistemic engagement.

View on arXiv PDF

Similar