Pyramid Feature Attention Network for Monocular Depth Prediction
This addresses depth prediction accuracy for computer vision applications like autonomous driving, though it appears incremental as it builds on existing attention mechanisms.
The paper tackles inaccurate spatial layout, ambiguous boundaries, and discontinuous object surfaces in monocular depth estimation by proposing a Pyramid Feature Attention Network (PFANet) with dual-scale channel attention and spatial pyramid attention modules, which outperforms state-of-the-art methods on the KITTI dataset.
Deep convolutional neural networks (DCNNs) have achieved great success in monocular depth estimation (MDE). However, few existing works take the contributions for MDE of different levels feature maps into account, leading to inaccurate spatial layout, ambiguous boundaries and discontinuous object surface in the prediction. To better tackle these problems, we propose a Pyramid Feature Attention Network (PFANet) to improve the high-level context features and low-level spatial features. In the proposed PFANet, we design a Dual-scale Channel Attention Module (DCAM) to employ channel attention in different scales, which aggregate global context and local information from the high-level feature maps. To exploit the spatial relationship of visual features, we design a Spatial Pyramid Attention Module (SPAM) which can guide the network attention to multi-scale detailed information in the low-level feature maps. Finally, we introduce scale-invariant gradient loss to increase the penalty on errors in depth-wise discontinuous regions. Experimental results show that our method outperforms state-of-the-art methods on the KITTI dataset.