Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

Kichang Lee, Jaeho Jin, JaeYeon Park, Songkuk Kim, JeongGil Ko

arXiv:2602.06638v2h-index: 5

Originality Highly original

AI Analysis

This work addresses a critical security vulnerability in federated learning systems, where confidence manipulation can disrupt mission-critical applications like healthcare and autonomous driving, representing a novel attack surface rather than an incremental improvement.

The paper tackles the problem of model confidence calibration as a new attack objective in federated learning, introducing the Temperature Scaling Attack (TSA) that degrades calibration while preserving accuracy, resulting in up to 145% error increase on CIFAR-100 and 7.2x increases in critical case errors in healthcare and autonomous driving.

Predictive confidence serves as a foundational control signal in mission-critical systems, directly governing risk-aware logic such as escalation, abstention, and conservative fallback. While prior federated learning attacks predominantly target accuracy or implant backdoors, we identify confidence calibration as a distinct attack objective. We present the Temperature Scaling Attack (TSA), a training-time attack that degrades calibration while preserving accuracy. By injecting temperature scaling with learning rate-temperature coupling during local training, malicious updates maintain benign-like optimization behavior, evading accuracy-based monitoring and similarity-based detection. We provide a convergence analysis under non-IID settings, showing that this coupling preserves standard convergence bounds while systematically distorting confidence. Across three benchmarks, TSA substantially shifts calibration (e.g., 145% error increase on CIFAR-100) with <2 accuracy change, and remains effective under robust aggregation and post-hoc calibration defenses. Case studies further show that confidence manipulation can cause up to 7.2x increases in missed critical cases (healthcare) or false alarms (autonomous driving), even when accuracy is unchanged. Overall, our results establish calibration integrity as a critical attack surface in federated learning.

View on arXiv PDF

Similar