Cascaded Scene Flow Prediction using Semantic Segmentation
This work addresses scene flow estimation for autonomous driving by reducing inconsistencies in motion predictions, though it is incremental as it builds on existing assumptions about rigid motion.
The paper tackles the problem of predicting 3D scene flow from stereo camera frames by using semantic segmentation to improve consistency in rigidly moving objects, achieving state-of-the-art performance on the KITTI benchmark.
Given two consecutive frames from a pair of stereo cameras, 3D scene flow methods simultaneously estimate the 3D geometry and motion of the observed scene. Many existing approaches use superpixels for regularization, but may predict inconsistent shapes and motions inside rigidly moving objects. We instead assume that scenes consist of foreground objects rigidly moving in front of a static background, and use semantic cues to produce pixel-accurate scene flow estimates. Our cascaded classification framework accurately models 3D scenes by iteratively refining semantic segmentation masks, stereo correspondences, 3D rigid motion estimates, and optical flow fields. We evaluate our method on the challenging KITTI autonomous driving benchmark, and show that accounting for the motion of segmented vehicles leads to state-of-the-art performance.