Making Convolutional Networks Shift-Invariant Again
This addresses the issue of shift-invariance for deep learning practitioners, improving model reliability, but it is incremental as it applies a known signal processing technique to existing architectures.
The paper tackled the problem of modern convolutional networks lacking shift-invariance by correctly integrating anti-aliasing low-pass filtering before downsampling, resulting in increased accuracy on ImageNet classification across architectures like ResNet and better generalization in terms of stability and robustness to input corruptions.
Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe \textit{increased accuracy} in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe \textit{better generalization}, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks. Code and anti-aliased versions of popular networks are available at https://richzhang.github.io/antialiased-cnns/ .