CVLGROApr 8, 2020

Self-Supervised Monocular Scene Flow Estimation

arXiv:2004.04143v2121 citations
Originality Incremental advance
AI Analysis

This addresses the lack of practical solutions for 3D environment perception in applications like robotics or autonomous driving, though it is incremental as it builds on existing optical flow and depth estimation techniques.

The paper tackles the ill-posed problem of monocular scene flow estimation from two consecutive images, proposing a self-supervised CNN method that achieves state-of-the-art accuracy among unsupervised approaches and competitive results in real-time.

Scene flow estimation has been receiving increasing attention for 3D environment perception. Monocular scene flow estimation -- obtaining 3D structure and 3D motion from two temporally consecutive images -- is a highly ill-posed problem, and practical solutions are lacking to date. We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance. By taking an inverse problem view, we design a single convolutional neural network (CNN) that successfully estimates depth and 3D motion simultaneously from a classical optical flow cost volume. We adopt self-supervised learning with 3D loss functions and occlusion reasoning to leverage unlabeled data. We validate our design choices, including the proxy loss and augmentation setup. Our model achieves state-of-the-art accuracy among unsupervised/self-supervised learning approaches to monocular scene flow, and yields competitive results for the optical flow and monocular depth estimation sub-tasks. Semi-supervised fine-tuning further improves the accuracy and yields promising results in real-time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes