Scale equivariance in CNNs with vector fields
This work addresses the challenge of handling scale variations in image data for computer vision applications, representing an incremental advancement in equivariant neural network design.
The authors tackled the problem of achieving local scale equivariance in Convolutional Neural Networks by applying filters at multiple scales and encoding outputs as vector fields, resulting in a performance improvement of over 20% in scale equivariant regression tasks on scaled MNIST digits.
We study the effect of injecting local scale equivariance into Convolutional Neural Networks. This is done by applying each convolutional filter at multiple scales. The output is a vector field encoding for the maximally activating scale and the scale itself, which is further processed by the following convolutional layers. This allows all the intermediate representations to be locally scale equivariant. We show that this improves the performance of the model by over $20\%$ in the scale equivariant task of regressing the scaling factor applied to randomly scaled MNIST digits. Furthermore, we find it also useful for scale invariant tasks, such as the actual classification of randomly scaled digits. This highlights the usefulness of allowing for a compact representation that can also learn relationships between different local scales by keeping internal scale equivariance.