Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry
This work addresses SLAM in dynamic scenes, which is a practical challenge for robotics and AR/VR applications, but it is incremental as it builds on existing learning-based visual odometry methods.
The paper tackles the problem of dense RGB-D SLAM in dynamic environments by proposing a pipeline that integrates TartanVO (a learning-based visual odometry method) with dynamic/static segmentation to filter out moving objects, enabling reconstruction of a static map and iterative pose refinement.
We propose a dense dynamic RGB-D SLAM pipeline based on a learning-based visual odometry, TartanVO. TartanVO, like other direct methods rather than feature-based, estimates camera pose through dense optical flow, which only applies to static scenes and disregards dynamic objects. Due to the color constancy assumption, optical flow is not able to differentiate between dynamic and static pixels. Therefore, to reconstruct a static map through such direct methods, our pipeline resolves dynamic/static segmentation by leveraging the optical flow output, and only fuse static points into the map. Moreover, we rerender the input frames such that the dynamic pixels are removed and iteratively pass them back into the visual odometry to refine the pose estimate.