ROMay 17

MUSE: Multimodal Uncertainty Quantification of State Estimation

Minkyung Kim, Henry Che, Bhargav Chandaka, Bhumsitt Pramuanpornsatid, Chengyu Yang, Sheng Cheng, Xiaofeng Wang, Naira Hovakimyan, Shenlong Wang

arXiv:2605.1742115.1

Predicted impact top 38% in RO · last 90 daysOriginality Incremental advance

AI Analysis

For robotics applications requiring reliable state estimation, MUSE addresses the challenge of uncertainty quantification in multimodal sensor fusion.

MUSE introduces a real-time learning-based framework using Mamba to quantify uncertainty in visual-inertial odometry, achieving superior reliability and robustness over existing methods on public and in-house datasets.

Accurate visual state estimation has been a central topic in robotics with a wide range of applications in robot navigation, autonomous driving, and autonomous flight. Recent advances in robot perception have led to significant improvements in the accuracy and robustness of state estimation, yet a fundamental challenge remains in how to quantify and calibrate its precision, i.e., how confident we are in an estimate and whether failures can be detected. This issue is particularly pronounced in visual-inertial odometry (VIO), where the heteroscedastic and multimodal nature of the problem makes uncertainty quantification especially difficult. This paper introduces MUSE (Multimodal Uncertainty Quantification of State Estimation), a novel real-time learning-based framework that leverages the strong and efficient sequential modeling capacity of Mamba to estimate localization uncertainty from multiple asynchronous sensor streams. Experiments on both public and in-house datasets demonstrate that MUSE achieves superior reliability and robustness compared to existing uncertainty quantification methods, and ablation studies justify the benefits of its key design choices.

View on arXiv PDF

Similar