CVLGROMay 5, 2021

Self-Supervised Multi-Frame Monocular Scene Flow

arXiv:2105.02216v159 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate and efficient scene flow estimation for applications like autonomous driving, but it is incremental as it builds on existing two-frame baselines.

The paper tackles the problem of estimating 3D scene flow from monocular images, achieving state-of-the-art accuracy on the KITTI dataset while maintaining real-time efficiency through a self-supervised multi-frame network.

Estimating 3D scene flow from a sequence of monocular images has been gaining increased attention due to the simple, economical capture setup. Owing to the severe ill-posedness of the problem, the accuracy of current methods has been limited, especially that of efficient, real-time approaches. In this paper, we introduce a multi-frame monocular scene flow network based on self-supervised learning, improving the accuracy over previous networks while retaining real-time efficiency. Based on an advanced two-frame baseline with a split-decoder design, we propose (i) a multi-frame model using a triple frame input and convolutional LSTM connections, (ii) an occlusion-aware census loss for better accuracy, and (iii) a gradient detaching strategy to improve training stability. On the KITTI dataset, we observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes