CVROSep 1, 2020

Bidirectional Attention Network for Monocular Depth Estimation

arXiv:2009.00743v288 citations
AI Analysis

This work addresses a key limitation in convolutional neural networks for depth estimation, offering a more efficient solution for applications like autonomous driving and robotics.

The paper tackles the problem of monocular depth estimation by proposing a Bidirectional Attention Network (BANet) to better integrate local and global information, achieving performance on par with or better than state-of-the-art methods on KITTI and DIODE datasets with reduced memory and computational complexity.

In this paper, we propose a Bidirectional Attention Network (BANet), an end-to-end framework for monocular depth estimation (MDE) that addresses the limitation of effectively integrating local and global information in convolutional neural networks. The structure of this mechanism derives from a strong conceptual foundation of neural machine translation, and presents a light-weight mechanism for adaptive control of computation similar to the dynamic nature of recurrent neural networks. We introduce bidirectional attention modules that utilize the feed-forward feature maps and incorporate the global context to filter out ambiguity. Extensive experiments reveal the high degree of capability of this bidirectional attention model over feed-forward baselines and other state-of-the-art methods for monocular depth estimation on two challenging datasets -- KITTI and DIODE. We show that our proposed approach either outperforms or performs at least on a par with the state-of-the-art monocular depth estimation methods with less memory and computational complexity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes