HyperFusion-Net: Densely Reflective Fusion for Salient Object Detection
This addresses the problem of accurately detecting salient objects in complex scenes for applications like human-computer interaction, representing a strong specific gain rather than a foundational advancement.
The paper tackles salient object detection by proposing HyperFusion-Net, a feature learning framework that decomposes images into reflective pairs and fuses multi-scale features, achieving state-of-the-art performance on seven public datasets with a large margin.
Salient object detection (SOD), which aims to find the most important region of interest and segment the relevant object/item in that area, is an important yet challenging vision task. This problem is inspired by the fact that human seems to perceive main scene elements with high priorities. Thus, accurate detection of salient objects in complex scenes is critical for human-computer interaction. In this paper, we present a novel feature learning framework for SOD, in which we cast the SOD as a pixel-wise classification problem. The proposed framework utilizes a densely hierarchical feature fusion network, named HyperFusion-Net, automatically predicts the most important area and segments the associated objects in an end-to-end manner. Specifically, inspired by the human perception system and image reflection separation, we first decompose input images into reflective image pairs by content-preserving transforms. Then, the complementary information of reflective image pairs is jointly extracted by an interweaved convolutional neural network (ICNN) and hierarchically combined with a hyper-dense fusion mechanism. Based on the fused multi-scale features, our method finally achieves a promising way of predicting SOD. As shown in our extensive experiments, the proposed method consistently outperforms other state-of-the-art methods on seven public datasets with a large margin.