Learning to Complete Object Shapes for Object-level Mapping in Dynamic Scenes
This work addresses the challenge of accurate object mapping in dynamic environments for robotics and computer vision applications, representing an incremental improvement over prior techniques.
The paper tackles the problem of object-level mapping in dynamic scenes by proposing a system that segments, tracks, and reconstructs objects, and completes their geometries using depth inputs and shape priors, resulting in improved tracking and reconstruction performance compared to existing methods.
In this paper, we propose a novel object-level mapping system that can simultaneously segment, track, and reconstruct objects in dynamic scenes. It can further predict and complete their full geometries by conditioning on reconstructions from depth inputs and a category-level shape prior with the aim that completed object geometry leads to better object reconstruction and tracking accuracy. For each incoming RGB-D frame, we perform instance segmentation to detect objects and build data associations between the detection and the existing object maps. A new object map will be created for each unmatched detection. For each matched object, we jointly optimise its pose and latent geometry representations using geometric residual and differential rendering residual towards its shape prior and completed geometry. Our approach shows better tracking and reconstruction performance compared to methods using traditional volumetric mapping or learned shape prior approaches. We evaluate its effectiveness by quantitatively and qualitatively testing it in both synthetic and real-world sequences.