LG AISep 15, 2025

Know What You Don't Know: Selective Prediction for Early Exit DNNs

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

arXiv:2509.11520v14.1h-index: 16Has CodeAIMLSystems

Originality Incremental advance

AI Analysis

This work addresses inference latency and trustworthiness issues for deploying DNNs in critical applications, representing an incremental improvement by combining early exit with selective prediction.

The paper tackles the problem of overconfidence in early exit deep neural networks, which can lead to untrustworthy predictions, by proposing SPEED, a method that uses selective prediction with deferral classifiers to identify hard samples and defer them to experts, resulting in a 50% reduction in wrong predictions and a 2.05x speedup compared to using only the final layer.

Inference latency and trustworthiness of Deep Neural Networks (DNNs) are the bottlenecks in deploying them in critical applications like sensitive tasks. Early Exit (EE) DNNs overcome the latency issues by allowing samples to exit from intermediary layers if they attain `high' confidence scores on the predicted class. However, the DNNs are known to exhibit overconfidence, which can lead to many samples exiting early and render EE strategies untrustworthy. We use Selective Prediction (SP) to overcome this issue by checking the `hardness' of the samples rather than just relying on the confidence score alone. We propose SPEED, a novel approach that uses Deferral Classifiers (DCs) at each layer to check the hardness of samples before performing EEs. Specifically, the DCs identify if a sample is hard to predict at an intermediary layer, leading to hallucination, and defer it to an expert. Early detection of hard samples for inference prevents the wastage of computational resources and improves trust by deferring the hard samples to the expert. We demonstrate that EE aided with SP improves both accuracy and latency. Our method minimizes the risk of wrong prediction by $50\%$ with a speedup of $2.05\times$ as compared to the final layer. The anonymized source code is available at https://github.com/Div290/SPEED

View on arXiv PDF Code

Similar