SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones
This addresses the practical deployment challenge of early-exit networks for vision tasks by providing risk-controlled guarantees, though it is an incremental improvement building on existing distillation and conformal methods.
The paper tackles the problem of knowing when early exit is safe in vision networks by introducing SAFE-KD, a wrapper that combines hierarchical distillation with conformal risk control to guarantee a user-specified misclassification risk for early-exiting samples, resulting in improved accuracy-compute trade-offs and robust performance across datasets and architectures.
Early-exit networks reduce inference cost by allowing ``easy'' inputs to stop early, but practical deployment hinges on knowing \emph{when} early exit is safe. We introduce SAFE-KD, a universal multi-exit wrapper for modern vision backbones that couples hierarchical distillation with \emph{conformal risk control}. SAFE-KD attaches lightweight exit heads at intermediate depths, distills a strong teacher into all exits via Decoupled Knowledge Distillation (DKD), and enforces deep-to-shallow consistency between exits. At inference, we calibrate per-exit stopping thresholds on a held-out set using conformal risk control (CRC) to guarantee a user-specified \emph{selective} misclassification risk (among the samples that exit early) under exchangeability. Across multiple datasets and architectures, SAFE-KD yields improved accuracy compute trade-offs, stronger calibration, and robust performance under corruption while providing finite-sample risk guarantees.