CVAISep 25, 2024

The Overfocusing Bias of Convolutional Neural Networks: A Saliency-Guided Regularization Approach

arXiv:2409.17370v13 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses a generalization issue in CNNs for computer vision, particularly in low-data settings, but is incremental as it builds on existing regularization methods.

The paper tackles the problem of convolutional neural networks (CNNs) overfocusing on narrow image regions in low-data regimes, which harms generalization, and introduces Saliency Guided Dropout (SGDrop) to address this, showing enhanced generalization and more expansive attributions in experiments.

Despite transformers being considered as the new standard in computer vision, convolutional neural networks (CNNs) still outperform them in low-data regimes. Nonetheless, CNNs often make decisions based on narrow, specific regions of input images, especially when training data is limited. This behavior can severely compromise the model's generalization capabilities, making it disproportionately dependent on certain features that might not represent the broader context of images. While the conditions leading to this phenomenon remain elusive, the primary intent of this article is to shed light on this observed behavior of neural networks. Our research endeavors to prioritize comprehensive insight and to outline an initial response to this phenomenon. In line with this, we introduce Saliency Guided Dropout (SGDrop), a pioneering regularization approach tailored to address this specific issue. SGDrop utilizes attribution methods on the feature map to identify and then reduce the influence of the most salient features during training. This process encourages the network to diversify its attention and not focus solely on specific standout areas. Our experiments across several visual classification benchmarks validate SGDrop's role in enhancing generalization. Significantly, models incorporating SGDrop display more expansive attributions and neural activity, offering a more comprehensive view of input images in contrast to their traditionally trained counterparts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes