CVLGSep 17, 2024

Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations

arXiv:2409.11140v26 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses scale invariance in computer vision for image datasets, presenting incremental algorithmic extensions to improve performance on spatial scaling variations.

The paper tackled the problem of scale generalisation in deep networks by evaluating scale-covariant and scale-invariant Gaussian derivative networks on rescaled Fashion-MNIST and CIFAR-10 datasets, achieving better scale generalisation than previously reported methods on STIR datasets, with improvements from techniques like average pooling and scale-channel dropout.

This paper presents an in-depth analysis of the scale generalisation properties of the scale-covariant and scale-invariant Gaussian derivative networks, complemented with both conceptual and algorithmic extensions. For this purpose, Gaussian derivative networks (GaussDerNets) are evaluated on new rescaled versions of the Fashion-MNIST and the CIFAR-10 datasets, with spatial scaling variations over a factor of 4 in the testing data, that are not present in the training data. Additionally, evaluations on the previously existing STIR datasets show that the GaussDerNets achieve better scale generalisation than previously reported for these datasets for other types of deep networks. We first experimentally demonstrate that the GaussDerNets have quite good scale generalisation properties on the new datasets, and that average pooling of feature responses over scales may sometimes also lead to better results than the previously used approach of max pooling over scales. Then, we demonstrate that using a spatial max pooling mechanism after the final layer enables localisation of non-centred objects in image domain, with maintained scale generalisation properties. We also show that regularisation during training, by applying dropout across the scale channels, referred to as scale-channel dropout, improves both the performance and the scale generalisation. In additional ablation studies, we demonstrate that discretisations of GaussDerNets, based on the discrete analogue of the Gaussian kernel in combination with central difference operators, perform best or among the best, compared to a set of other discrete approximations of the Gaussian derivative kernels. Finally, by visualising the activation maps and the learned receptive fields, we demonstrate that the GaussDerNets have very good explainability properties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes