CVMar 30, 2020

DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning

arXiv:2003.13446v1118 citations
AI Analysis

This addresses robustness issues in monocular depth estimation for applications like autonomous driving, though it is incremental as it builds on existing unsupervised methods.

The paper tackles the problem of monocular depth estimation lacking robustness in challenging domains like nighttime scenes by proposing DeFeat-Net, which learns depth and features simultaneously, resulting in around 10% reduction in error measures on nighttime driving sequences.

In the current monocular depth research, the dominant approach is to employ unsupervised training on large datasets, driven by warped photometric consistency. Such approaches lack robustness and are unable to generalize to challenging domains such as nighttime scenes or adverse weather conditions where assumptions about photometric consistency break down. We propose DeFeat-Net (Depth & Feature network), an approach to simultaneously learn a cross-domain dense feature representation, alongside a robust depth-estimation framework based on warped feature consistency. The resulting feature representation is learned in an unsupervised manner with no explicit ground-truth correspondences required. We show that within a single domain, our technique is comparable to both the current state of the art in monocular depth estimation and supervised feature representation learning. However, by simultaneously learning features, depth and motion, our technique is able to generalize to challenging domains, allowing DeFeat-Net to outperform the current state-of-the-art with around 10% reduction in all error measures on more challenging sequences such as nighttime driving.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes