Scaling-Translation-Equivariant Networks with Decomposed Convolutional Filters
This addresses the challenge of handling multiscale inputs in computer vision, offering a more efficient and robust solution for tasks like image classification, though it is incremental as it builds on existing equivariant network concepts.
The paper tackled the problem of encoding scale information in convolutional neural networks for multiscale image tasks by proposing a scaling-translation-equivariant network with decomposed filters, which achieved significantly improved performance in multiscale image classification and better interpretability at a reduced model size.
Encoding the scale information explicitly into the representation learned by a convolutional neural network (CNN) is beneficial for many computer vision tasks especially when dealing with multiscale inputs. We study, in this paper, a scaling-translation-equivariant (ST-equivariant) CNN with joint convolutions across the space and the scaling group, which is shown to be both sufficient and necessary to achieve equivariance for the regular representation of the scaling-translation group ST . To reduce the model complexity and computational burden, we decompose the convolutional filters under two pre-fixed separable bases and truncate the expansion to low-frequency components. A further benefit of the truncated filter expansion is the improved deformation robustness of the equivariant representation, a property which is theoretically analyzed and empirically verified. Numerical experiments demonstrate that the proposed scaling-translation-equivariant network with decomposed convolutional filters (ScDCFNet) achieves significantly improved performance in multiscale image classification and better interpretability than regular CNNs at a reduced model size.