LG AIMar 12, 2025

Privacy-Preserved Automated Scoring using Federated Learning for Educational Research

arXiv:2503.11711v26 citationsh-index: 16

Originality Incremental advance

AI Analysis

This addresses data privacy concerns in educational research for schools and institutions, though it is incremental as it builds on existing federated learning and fine-tuning methods.

The study tackled automated scoring of educational assessments while preserving student data privacy by proposing a federated learning framework with parameter-efficient fine-tuning and adaptive aggregation. It achieved 94.5% accuracy, performing within 0.5-1.0 percentage points of centralized models and maintaining rubric-level scoring accuracy with minimal differences.

Data privacy remains a critical concern in educational research, requiring strict adherence to ethical standards and regulatory protocols. While traditional approaches rely on anonymization and centralized data collection, they often expose raw student data to security vulnerabilities and impose substantial logistical overhead. In this study, we propose a federated learning (FL) framework for automated scoring of educational assessments that eliminates the need to share sensitive data across institutions. Our approach leverages parameter-efficient fine-tuning of large language models (LLMs) with Low-Rank Adaptation (LoRA), enabling each client (school) to train locally while sharing only optimized model updates. To address data heterogeneity, we implement an adaptive weighted aggregation strategy that considers both client performance and data volume. We benchmark our model against two state-of-the-art FL methods and a centralized learning baseline using NGSS-aligned multi-label science assessment data from nine middle schools. Results show that our model achieves the highest accuracy (94.5%) among FL approaches, and performs within 0.5-1.0 percentage points of the centralized model on these metrics. Additionally, it achieves comparable rubric-level scoring accuracy, with only a 1.3% difference in rubric match and a lower score deviation (MAE), highlighting its effectiveness in preserving both prediction quality and interpretability.

View on arXiv PDF

Similar