Zero Time Waste: Recycling Predictions in Early Exit Neural Networks
This addresses the problem of reducing processing time for large deep learning models in real-world applications, representing an incremental improvement over existing early exit methods.
The paper tackles the inefficiency of discarding predictions from intermediate classifiers in early exit neural networks by introducing Zero Time Waste (ZTW), which reuses these predictions through direct connections and ensemble-like combination, achieving a significantly better accuracy vs. inference time trade-off.
The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (ICs) to intermediate layers of a neural network. ICs can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. However, if a particular IC does not decide to return an answer early, its predictions are discarded, with its computations effectively being wasted. To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in which each IC reuses predictions returned by its predecessors by (1) adding direct connections between ICs and (2) combining previous outputs in an ensemble-like manner. We conduct extensive experiments across various datasets and architectures to demonstrate that ZTW achieves a significantly better accuracy vs. inference time trade-off than other recently proposed early exit methods.