ROAICVJun 18, 2025

MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System

arXiv:2506.15402v12 citationsh-index: 11IEEE Robot Autom Lett
Originality Incremental advance
AI Analysis

This work addresses limitations in object-level SLAM for robotics by using surround-view cameras to enhance mapping in outdoor scenarios, though it appears incremental as it builds on existing SLAM methods with multi-camera integration.

The paper tackles the problem of object-level SLAM in complex outdoor environments by proposing MCOO-SLAM, a multi-camera omnidirectional system that integrates point features and object-level landmarks with open-vocabulary semantics, achieving accurate localization and scalable object-level mapping with improved robustness to occlusion, pose variation, and environmental complexity.

Object-level SLAM offers structured and semantically meaningful environment representations, making it more interpretable and suitable for high-level robotic tasks. However, most existing approaches rely on RGB-D sensors or monocular views, which suffer from narrow fields of view, occlusion sensitivity, and limited depth perception-especially in large-scale or outdoor environments. These limitations often restrict the system to observing only partial views of objects from limited perspectives, leading to inaccurate object modeling and unreliable data association. In this work, we propose MCOO-SLAM, a novel Multi-Camera Omnidirectional Object SLAM system that fully leverages surround-view camera configurations to achieve robust, consistent, and semantically enriched mapping in complex outdoor scenarios. Our approach integrates point features and object-level landmarks enhanced with open-vocabulary semantics. A semantic-geometric-temporal fusion strategy is introduced for robust object association across multiple views, leading to improved consistency and accurate object modeling, and an omnidirectional loop closure module is designed to enable viewpoint-invariant place recognition using scene-level descriptors. Furthermore, the constructed map is abstracted into a hierarchical 3D scene graph to support downstream reasoning tasks. Extensive experiments in real-world demonstrate that MCOO-SLAM achieves accurate localization and scalable object-level mapping with improved robustness to occlusion, pose variation, and environmental complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes