CMA-Net: A Cascaded Mutual Attention Network for Light Field Salient Object Detection
This work addresses light field salient object detection, a domain-specific task, by improving accuracy and speed for applications requiring precise object segmentation from multi-modal data.
The paper tackled the problem of segmenting salient objects from light field images, which provide multiple modalities like all-in-focus and depth, by proposing CMA-Net, a cascaded mutual attention network that fuses high-level features from these modalities, achieving state-of-the-art performance on two benchmark datasets and an inference speed of 53 fps.
In the past few years, numerous deep learning methods have been proposed to address the task of segmenting salient objects from RGB images. However, these approaches depending on single modality fail to achieve the state-of-the-art performance on widely used light field salient object detection (SOD) datasets, which collect large-scale natural images and provide multiple modalities such as multi-view, micro-lens images and depth maps. Most recently proposed light field SOD methods have acquired improving detecting accuracy, yet still predict rough objects' structures and perform slow inference speed. To this end, we propose CMA-Net, which consists of two novel cascaded mutual attention modules aiming at fusing the high level features from the modalities of all-in-focus and depth. Our proposed CMA-Net outperforms 30 SOD methods on two widely applied light field benchmark datasets. Besides, the proposed CMA-Net is able to inference at the speed of 53 fps. Extensive quantitative and qualitative experiments illustrate both the effectiveness and efficiency of our CMA-Net.