LGSep 25, 2025

DATS: Distance-Aware Temperature Scaling for Calibrated Class-Incremental Learning

arXiv:2509.21161v14.1h-index: 2

Originality Highly original

AI Analysis

This addresses the need for reliable uncertainty calibration in safety-critical continual learning applications, offering a more stable solution than prior data-centric approaches.

The paper tackles the problem of maintaining calibrated uncertainty in class-incremental learning, where existing methods cause large calibration errors across tasks, and proposes DATS, which reduces calibration error by adapting temperature based on task distance without requiring task information at test time.

Continual Learning (CL) is recently gaining increasing attention for its ability to enable a single model to learn incrementally from a sequence of new classes. In this scenario, it is important to keep consistent predictive performance across all the classes and prevent the so-called Catastrophic Forgetting (CF). However, in safety-critical applications, predictive performance alone is insufficient. Predictive models should also be able to reliably communicate their uncertainty in a calibrated manner - that is, with confidence scores aligned to the true frequencies of target events. Existing approaches in CL address calibration primarily from a data-centric perspective, relying on a single temperature shared across all tasks. Such solutions overlook task-specific differences, leading to large fluctuations in calibration error across tasks. For this reason, we argue that a more principled approach should adapt the temperature according to the distance to the current task. However, the unavailability of the task information at test time/during deployment poses a major challenge to achieve the intended objective. For this, we propose Distance-Aware Temperature Scaling (DATS), which combines prototype-based distance estimation with distance-aware calibration to infer task proximity and assign adaptive temperatures without prior task information. Through extensive empirical evaluation on both standard benchmarks and real-world, imbalanced datasets taken from the biomedical domain, our approach demonstrates to be stable, reliable and consistent in reducing calibration error across tasks compared to state-of-the-art approaches.

View on arXiv PDF

Similar