CVAug 9, 2021

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

arXiv:2108.03990v1172 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving salient object detection accuracy for computer vision applications, but it is incremental as it builds on existing U-Net frameworks and transformer methods.

The paper tackles RGB-D salient object detection by proposing TriTransNet, which uses a triplet transformer embedding module to enhance high-level features and integrates depth information, achieving state-of-the-art performance with concrete improvements in benchmark metrics.

Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. It is the first to use three transformer encoders with shared weights to enhance multi-level features. By further designing scale adjustment module to process the input, devising three-stream decoder to process the output and attaching depth features to color features for the multi-modal fusion, the proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection, and pushes the performance to a new level. Experimental results demonstrate the effectiveness of the proposed modules and the competition of TriTransNet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes