The Untapped Potential of Off-the-Shelf Convolutional Neural Networks
This addresses performance bottlenecks in computer vision by enabling dynamic inference, offering a novel approach to improve accuracy without increasing model size or training complexity.
The paper tackles the problem of static network topology at inference-time limiting performance, showing that by allowing four layers to dynamically change configuration, off-the-shelf models like ResNet-50 achieve over 95% accuracy on ImageNet, exceeding models with 20x more parameters.
Over recent years, a myriad of novel convolutional network architectures have been developed to advance state-of-the-art performance on challenging recognition tasks. As computational resources improve, a great deal of effort has been placed in efficiently scaling up existing designs and generating new architectures with Neural Architecture Search (NAS) algorithms. While network topology has proven to be a critical factor for model performance, we show that significant gains are being left on the table by keeping topology static at inference-time. Due to challenges such as scale variation, we should not expect static models configured to perform well across a training dataset to be optimally configured to handle all test data. In this work, we seek to expose the exciting potential of inference-time-dynamic models. By allowing just four layers to dynamically change configuration at inference-time, we show that existing off-the-shelf models like ResNet-50 are capable of over 95% accuracy on ImageNet. This level of performance currently exceeds that of models with over 20x more parameters and significantly more complex training procedures.