OV-MAP : Open-Vocabulary Zero-Shot 3D Instance Segmentation Map for Robots
This addresses the challenge of precise object recognition in 3D maps for mobile robots, representing an incremental improvement in zero-shot segmentation methods.
The paper tackles the problem of overlapping features reducing instance-level precision in open-world 3D mapping for robots by introducing OV-MAP, which uses class-agnostic segmentation and 3D mask voting to achieve accurate zero-shot 3D instance segmentation without 3D supervised models, demonstrating superior performance on datasets like ScanNet200 and Replica.
We introduce OV-MAP, a novel approach to open-world 3D mapping for mobile robots by integrating open-features into 3D maps to enhance object recognition capabilities. A significant challenge arises when overlapping features from adjacent voxels reduce instance-level precision, as features spill over voxel boundaries, blending neighboring regions together. Our method overcomes this by employing a class-agnostic segmentation model to project 2D masks into 3D space, combined with a supplemented depth image created by merging raw and synthetic depth from point clouds. This approach, along with a 3D mask voting mechanism, enables accurate zero-shot 3D instance segmentation without relying on 3D supervised segmentation models. We assess the effectiveness of our method through comprehensive experiments on public datasets such as ScanNet200 and Replica, demonstrating superior zero-shot performance, robustness, and adaptability across diverse environments. Additionally, we conducted real-world experiments to demonstrate our method's adaptability and robustness when applied to diverse real-world environments.