Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions
It provides a comprehensive review for researchers working on efficient AI inference, but is incremental as it synthesizes existing work without novel contributions.
This paper surveys early-exit networks as an adaptive inference method for efficient deep neural network deployment, analyzing their design components, recent advances, and challenges, but does not present new experimental results or concrete performance numbers.
DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, adaptive inference is gaining attention as a prominent approach for pushing the limits of efficient deployment. Particularly, early-exit networks comprise an emerging direction for tailoring the computation depth of each input sample at runtime, offering complementary performance gains to other efficiency optimisations. In this paper, we decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We also position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field.