CVApr 16

Frequency-Enhanced Dual-Subspace Networks for Few-Shot Fine-Grained Image Classification

Meijia Wang, Guochao Wang, Haozhen Chu, Bin Yao, Weichuan Zhang, Yuan Wang, Junpo Yang

arXiv:2604.149589.8h-index: 4

AI Analysis

For researchers in fine-grained visual recognition, this work addresses the bottleneck of texture bias and overfitting in few-shot settings by introducing a frequency-enhanced dual-subspace approach.

The paper tackles few-shot fine-grained image classification, where models often overfit to texture biases and background noise. The proposed Frequency-Enhanced Dual-Subspace Network (FEDSNet) uses DCT and low-pass filtering to isolate structural features, and fuses spatial and frequency subspaces via an adaptive gating mechanism, achieving competitive results on four benchmarks (e.g., CUB-200-2011, Stanford Cars) with improved robustness.

Few-shot fine-grained image classification aims to recognize subcategories with high visual similarity using only a limited number of annotated samples. Existing metric learning-based methods typically rely solely on spatial domain features. Confined to this single perspective, models inevitably suffer from inherent texture biases, entangling essential structural details with high-frequency background noise. Furthermore, lacking cross-view geometric constraints, single-view metrics tend to overfit this noise, resulting in structural instability under few-shot conditions. To address these issues, this paper proposes the Frequency-Enhanced Dual-Subspace Network (FEDSNet). Specifically, FEDSNet utilizes the Discrete Cosine Transform (DCT) and a low-pass filtering mechanism to explicitly isolate low-frequency global structural components from spatial features, thereby suppressing background interference. Truncated Singular Value Decomposition (SVD) is employed to construct independent, low-rank linear subspaces for both spatial texture and frequency structural features. An adaptive gating mechanism is designed to dynamically fuse the projection distances from these dual views. This strategy leverages the structural stability of the frequency subspace to prevent the spatial subspace from overfitting to background features. Extensive experiments on four benchmark datasets - CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft - demonstrate that FEDSNet exhibits excellent classification performance and robustness, achieving highly competitive results compared to existing metric learning algorithms. Complexity analysis further confirms that the proposed network achieves a favorable balance between high accuracy and computational efficiency, providing an effective new paradigm for few-shot fine-grained visual recognition.

View on arXiv PDF

Similar