Learning Residual Flow as Dynamic Motion from Stereo Videos
This addresses scene understanding for autonomous vehicles by improving motion decomposition from stereo vision, though it appears incremental as it builds on existing geometry-based approaches with learning enhancements.
The paper tackles the problem of decomposing 3D scene flow from stereo videos into stationary elements and dynamic object motion, using an unsupervised learning framework that jointly reasons about camera motion, optical flow, and 3D motion. The method outperforms state-of-the-art algorithms on optical flow and visual odometry tasks on the KITTI dataset.
We present a method for decomposing the 3D scene flow observed from a moving stereo rig into stationary scene elements and dynamic object motion. Our unsupervised learning framework jointly reasons about the camera motion, optical flow, and 3D motion of moving objects. Three cooperating networks predict stereo matching, camera motion, and residual flow, which represents the flow component due to object motion and not from camera motion. Based on rigid projective geometry, the estimated stereo depth is used to guide the camera motion estimation, and the depth and camera motion are used to guide the residual flow estimation. We also explicitly estimate the 3D scene flow of dynamic objects based on the residual flow and scene depth. Experiments on the KITTI dataset demonstrate the effectiveness of our approach and show that our method outperforms other state-of-the-art algorithms on the optical flow and visual odometry tasks.