CVSep 19, 2022
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural NetworksHubert Leterme, Kévin Polisano, Valérie Perrier et al.
This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification. Specifically, we tackle the instability issue arising in their first layer, which tends to learn parameters that closely resemble oriented band-pass filters when trained on datasets like ImageNet. Subsampled convolutions with such Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. In this context, we establish conditions under which the max pooling operator approximates a complex modulus, which is nearly shift invariant. We then derive a measure of shift invariance for subsampled convolutions followed by max pooling. In particular, we highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition.
CVDec 1, 2022
From CNNs to Shift-Invariant Twin Models Based on Complex WaveletsHubert Leterme, Kévin Polisano, Valérie Perrier et al.
We propose a novel method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination "real-valued convolutions + max pooling" (RMax) by "complex-valued convolutions + modulus" (CMod), which is stable to translations, or shifts. To justify our approach, we claim that CMod and RMax produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context, CMod can therefore be considered as a stable alternative to RMax. To enforce this property, we constrain the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation.
CVMar 19, 2025
Disentangling Modes and Interference in the Spectrogram of Multicomponent SignalsKévin Polisano, Sylvain Meignen, Nils Laurent et al.
In this paper, we investigate how the spectrogram of multicomponent signals can be decomposed into a mode part and an interference part. We explore two approaches: (i) a variational method inspired by texture-geometry decomposition in image processing, and (ii) a supervised learning approach using a U-Net architecture, trained on a dataset encompassing diverse interference patterns and noise conditions. Once the interference component is identified, we explain how it enables us to define a criterion to locally adapt the window length used in the definition of the spectrogram, for the sake of improving ridge detection in the presence of close modes. Numerical experiments illustrate the advantages and limitations of both approaches for spectrogram decomposition, highlighting their potential for enhancing time-frequency analysis in the presence of strong interference.
CVMar 19, 2025
Manifold Learning for Hyperspectral ImagesFethi Harkat, Guillaume Gey, Valérie Perrier et al.
Traditional feature extraction and projection techniques, such as Principal Component Analysis, struggle to adequately represent X-Ray Transmission (XRT) Multi-Energy (ME) images, limiting the performance of neural networks in decision-making processes. To address this issue, we propose a method that approximates the dataset topology by constructing adjacency graphs using the Uniform Manifold Approximation and Projection. This approach captures nonlinear correlations within the data, significantly improving the performance of machine learning algorithms, particularly in processing Hyperspectral Images (HSI) from X-ray transmission spectroscopy. This technique not only preserves the global structure of the data but also enhances feature separability, leading to more accurate and robust classification results.
CVMar 18, 2024
Gridless 2D Recovery of Lines using the Sliding Frank-Wolfe AlgorithmKévin Polisano, Basile Dubois-Bonnaire, Sylvain Meignen
We present a new approach leveraging the Sliding Frank--Wolfe algorithm to address the challenge of line recovery in degraded images. Building upon advances in conditional gradient methods for sparse inverse problems with differentiable measurement models, we propose two distinct models tailored for line detection tasks within the realm of blurred line deconvolution and ridge detection of linear chirps in spectrogram images.