Cross-modal Affinity-aligned Multimodal Learning Analytics for Predicting Student Collaboration Satisfaction in Game-Based Learning
For educational technology researchers, this work addresses the practical problem of unreliable multimodal data in classroom deployments, offering a robust fusion method that adaptively handles uninformative modalities.
The paper tackles the challenge of predicting student collaboration satisfaction in game-based learning, where modality degradation (e.g., inconsistent eye gaze data) hinders multimodal fusion. The proposed AAMLA framework with CAMA module improves prediction accuracy over baselines, achieving consistent gains under standard and degraded conditions, validated on 50 students.
Collaborative game-based learning environments offer rich opportunities for small-group knowledge construction, yet automatically predicting student collaboration satisfaction remains challenging. A critical barrier is modality degradation: in educational deployments, individual modalities such as eye gaze exhibit inconsistent informativeness across student cohorts, causing implicit attention-based fusion to produce brittle multimodal representations. We propose the Affinity-Aligned Multimodal Learning Analytics (AAMLA) framework, whose core contribution is the Cross-modal Affinity-guided Modality Alignment (CAMA) module, which explicitly models inter-modal relationships via affinity matrices and enforces cross-modal consistency through contrastive learning, enabling adaptive suppression of uninformative modalities without discarding them. AAMLA further applies modality-specific projection layers to map heterogeneous features, including facial action units, head pose, eye gaze, and interaction trace logs, into a unified semantic space prior to alignment. Experiments on 50 middle school students in the EcoJourneys collaborative learning environment demonstrate consistent improvements over unimodal baselines and prior cross-attention approaches under standard and modality degradation conditions, with SHAP and t-SNE analyses confirming that CAMA produces robust, interpretable cross-modal representations for student collaboration modeling.