ROApr 12

MonoEM-GS: Monocular Expectation-Maximization Gaussian Splatting SLAM

arXiv:2604.1059369.5h-index: 2
AI Analysis

For monocular SLAM researchers, this work provides a method to stabilize noisy geometric priors from foundation models into a consistent 3D representation, enabling downstream tasks like open-set segmentation.

MonoEM-GS addresses view-dependent and noisy geometric predictions from feed-forward models in monocular SLAM by integrating them into a global Gaussian Splatting representation via an Expectation-Maximization formulation and ICP-based alignment, achieving competitive performance on 7-Scenes, TUM RGB-D, and Replica.

Feed-forward geometric foundation models can infer dense point clouds and camera motion directly from RGB streams, providing priors for monocular SLAM. However, their predictions are often view-dependent and noisy: geometry can vary across viewpoints and under image transformations, and local metric properties may drift between frames. We present MonoEM-GS, a monocular mapping pipeline that integrates such geometric predictions into a global Gaussian Splatting representation while explicitly addressing these inconsistencies. MonoEM-GS couples Gaussian Splatting with an Expectation--Maximization formulation to stabilize geometry, and employs ICP-based alignment for monocular pose estimation. Beyond geometry, MonoEM-GS parameterizes Gaussians with multi-modal features, enabling in-place open-set segmentation and other downstream queries directly on the reconstructed map. We evaluate MonoEM-GS on 7-Scenes, TUM RGB-D and Replica, and compare against recent baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes