CVApr 28

Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation

arXiv:2604.251886.2
Predicted impact top 91% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

For computer vision researchers, this work incrementally improves image classification by combining known techniques (dilated convolution, attention) with a stochastic masking mechanism.

RDCNet integrates multi-branch random dilated convolution, fine-grained feature enhancement, and context excitation to improve image classification accuracy and robustness to noise. It achieves state-of-the-art results on five benchmarks, outperforming second-best methods by up to 4.73%.

Image classification remains a fundamental yet challenging task in computer vision, particularly when fine-grained feature extraction and background noise suppression are required simultaneously. Conventional convolutional neural networks, despite their remarkable success in hierarchical feature learning, often struggle with capturing multi-scale contextual information and are susceptible to overfitting when confronted with noisy or irrelevant image regions. In this paper, we propose RDCNet (Image Classification Network with Random Dilated Convolution), a novel architecture built upon ResNet-34 that integrates three synergistic innovations to address these limitations: (1) a Multi-Branch Random Dilated Convolution (MRDC) module that employs parallel branches with varying dilation rates combined with a stochastic masking mechanism to capture fine-grained features across multiple scales while enhancing robustness against noise and overfitting; (2) a Fine-Grained Feature Enhancement (FGFE) module embedded within MRDC that bridges global contextual information with local feature representations through adaptive pooling and bilinear interpolation, thereby amplifying sensitivity to subtle visual patterns; and (3) a Context Excitation (CE) module that leverages softmax-based spatial attention and channel recalibration to dynamically emphasize task-relevant features while suppressing background interference. Extensive experiments conducted on five benchmark datasets -- CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof -- demonstrate that RDCNet consistently achieves state-of-the-art classification accuracy, outperforming the second-best competing methods by margins of 0.02\%, 1.12\%, 0.18\%, 4.73\%, and 3.56\%, respectively, thereby validating the effectiveness and generalizability of the proposed approach across diverse visual recognition scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes