3DS-SLAM: A 3D Object Detection based Semantic SLAM towards Dynamic Indoor Environments
This addresses the challenge of robust SLAM in dynamic scenes for robotics and AR/VR applications, representing a strong specific gain rather than a foundational advancement.
The paper tackles the problem of camera localization accuracy decline in dynamic indoor environments by introducing 3DS-SLAM, a tightly-coupled semantic SLAM algorithm that integrates 3D object detection and geometric constraints, achieving an average improvement of 98.01% over ORB-SLAM2 on dynamic sequences of the TUM RGB-D dataset.
The existence of variable factors within the environment can cause a decline in camera localization accuracy, as it violates the fundamental assumption of a static environment in Simultaneous Localization and Mapping (SLAM) algorithms. Recent semantic SLAM systems towards dynamic environments either rely solely on 2D semantic information, or solely on geometric information, or combine their results in a loosely integrated manner. In this research paper, we introduce 3DS-SLAM, 3D Semantic SLAM, tailored for dynamic scenes with visual 3D object detection. The 3DS-SLAM is a tightly-coupled algorithm resolving both semantic and geometric constraints sequentially. We designed a 3D part-aware hybrid transformer for point cloud-based object detection to identify dynamic objects. Subsequently, we propose a dynamic feature filter based on HDBSCAN clustering to extract objects with significant absolute depth differences. When compared against ORB-SLAM2, 3DS-SLAM exhibits an average improvement of 98.01% across the dynamic sequences of the TUM RGB-D dataset. Furthermore, it surpasses the performance of the other four leading SLAM systems designed for dynamic environments.