CVNov 8, 2023

SODAWideNet -- Salient Object Detection with an Attention augmented Wide Encoder Decoder network without ImageNet pre-training

Rohit Venkata Sai Dulam, Chandra Kambhamettu

arXiv:2311.04828v22.85 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This addresses the need for more efficient and autonomous model design in computer vision, though it is incremental as it builds on existing SOD methods.

The paper tackles the problem of developing a salient object detection model without ImageNet pre-training by proposing SODAWideNet, a wide and shallow encoder-decoder network, which achieves competitive performance on five datasets with variants having 3.03M and 9.03M parameters.

Developing a new Salient Object Detection (SOD) model involves selecting an ImageNet pre-trained backbone and creating novel feature refinement modules to use backbone features. However, adding new components to a pre-trained backbone needs retraining the whole network on the ImageNet dataset, which requires significant time. Hence, we explore developing a neural network from scratch directly trained on SOD without ImageNet pre-training. Such a formulation offers full autonomy to design task-specific components. To that end, we propose SODAWideNet, an encoder-decoder-style network for Salient Object Detection. We deviate from the commonly practiced paradigm of narrow and deep convolutional models to a wide and shallow architecture, resulting in a parameter-efficient deep neural network. To achieve a shallower network, we increase the receptive field from the beginning of the network using a combination of dilated convolutions and self-attention. Therefore, we propose Multi Receptive Field Feature Aggregation Module (MRFFAM) that efficiently obtains discriminative features from farther regions at higher resolutions using dilated convolutions. Next, we propose Multi-Scale Attention (MSA), which creates a feature pyramid and efficiently computes attention across multiple resolutions to extract global features from larger feature maps. Finally, we propose two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), that achieve competitive performance against state-of-the-art models on five datasets.

View on arXiv PDF Code

Similar