CVSep 4, 2023

EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity

arXiv:2309.01296v15 citations
Originality Incremental advance
AI Analysis

This work addresses the bottleneck of accuracy in self-supervised monocular scene flow estimation, which is important for applications like autonomous driving, but it is incremental as it builds on existing methods with novel constraints and training strategies.

The paper tackles the problem of self-supervised monocular scene flow estimation, which aims to understand 3D structures and motions from consecutive images, by proposing EMR-MSF, a method that improves accuracy by leveraging network architecture design and imposing geometric constraints; it outperforms previous self-supervised works by 44% on the SF-all metric on the KITTI benchmark and matches supervised methods.

Self-supervised monocular scene flow estimation, aiming to understand both 3D structures and 3D motions from two temporally consecutive monocular images, has received increasing attention for its simple and economical sensor setup. However, the accuracy of current methods suffers from the bottleneck of less-efficient network architecture and lack of motion rigidity for regularization. In this paper, we propose a superior model named EMR-MSF by borrowing the advantages of network architecture design under the scope of supervised learning. We further impose explicit and robust geometric constraints with an elaborately constructed ego-motion aggregation module where a rigidity soft mask is proposed to filter out dynamic regions for stable ego-motion estimation using static regions. Moreover, we propose a motion consistency loss along with a mask regularization loss to fully exploit static regions. Several efficient training strategies are integrated including a gradient detachment technique and an enhanced view synthesis process for better performance. Our proposed method outperforms the previous self-supervised works by a large margin and catches up to the performance of supervised methods. On the KITTI scene flow benchmark, our approach improves the SF-all metric of the state-of-the-art self-supervised monocular method by 44% and demonstrates superior performance across sub-tasks including depth and visual odometry, amongst other self-supervised single-task or multi-task methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes