Scale Calibrated Training: Improving Generalization of Deep Networks via Scale-Specific Normalization
This addresses a practical inference speed issue for deep learning practitioners by making networks more flexible across image resolutions, though it is incremental as it builds on existing normalization techniques.
The paper tackles the problem of catastrophic accuracy drop in CNNs when testing with lower-resolution images than trained on, proposing Scale Calibrated Training (SCT) with Scale-Specific Batch Normalization to enable single networks to handle multiple test scales, improving ResNet-50 accuracy on ImageNet by 1.7% at 224 resolution and 11.5% at 128 resolution.
Standard convolutional neural networks(CNNs) require consistent image resolutions in both training and testing phase. However, in practice, testing with smaller image sizes is necessary for fast inference. We show that trivially evaluating low-resolution images on networks trained with high-resolution images results in a catastrophic accuracy drop in standard CNN architectures. We propose a novel training regime called Scale calibrated Training(SCT) which allows networks to learn from various scales of input simultaneously. By taking advantages of SCT, single network can provide decent accuracy at test time in response to multiple test scales. In our analysis, we surprisingly find that vanilla batch normalization can lead to sub-optimal performance in SCT. Therefore, a novel normalization scheme called Scale-Specific Batch Normalization is equipped to SCT in replacement of batch normalization. Experiment results show that SCT improves accuracy of single Resnet-50 on ImageNet by 1.7% and 11.5% accuracy when testing on image sizes of 224 and 128 respectively.