Margin and Consistency Supervision for Calibrated and Robust Vision Models
This addresses the issue of unreliable and non-robust vision models for users in safety-critical applications, representing an incremental improvement through a novel regularization method.
The paper tackled the problem of deep vision classifiers being poorly calibrated and fragile under distribution shifts by introducing Margin and Consistency Supervision (MaCS), a regularization framework that improved calibration (lower ECE and NLL) and robustness to corruptions while preserving or enhancing top-1 accuracy across various benchmarks and backbones.
Deep vision classifiers often achieve high accuracy while remaining poorly calibrated and fragile under small distribution shifts. We present Margin and Consistency Supervision (MaCS), a simple, architecture-agnostic regularization framework that jointly enforces logit-space separation and local prediction stability. MaCS augments cross-entropy with (i) a hinge-squared margin penalty that enforces a target logit gap between the correct class and the strongest competitor, and (ii) a consistency regularizer that minimizes the KL divergence between predictions on clean inputs and mildly perturbed views. We provide a unifying theoretical analysis showing that increasing classification margin while reducing local sensitivity formalized via a Lipschitz-type stability proxy yields improved generalization guarantees and a provable robustness radius bound scaling with the margin-to-sensitivity ratio. Across several image classification benchmarks and several backbones spanning CNNs and Vision Transformers, MaCS consistently improves calibration (lower ECE and NLL) and robustness to common corruptions while preserving or improving top-1 accuracy. Our approach requires no additional data, no architectural changes, and negligible inference overhead, making it an effective drop-in replacement for standard training objectives.