CVMay 28

GMOS: Grounding Moving Object Segmentation in 3D Space and Time

arXiv:2605.3035295.0
Predicted impact top 9% in CV · last 90 daysOriginality Highly original
AI Analysis

For researchers in video segmentation and autonomous systems, GMOS provides a more accurate and efficient method for segmenting moving objects in 3D space and time, enabling online inference.

GMOS addresses limitations of current Moving Object Segmentation methods by grounding segmentation in 3D space and time, operating directly on RGB video to produce 3D-aware, temporally fine-grained segmentation. It achieves state-of-the-art results across MOS, MOS-I, and Unsupervised VOS benchmarks, running significantly faster than prior multi-object methods.

Moving Object Segmentation (MOS) aims to discover, segment, and track objects that move independently of the camera. Current MOS methods, however, exhibit two fundamental limitations: they rely on pre-computed 2D auxiliary modalities such as optical flow or point trajectories that lack 3D geometric information, and they treat motion as a sequence-level attribute, overlooking the instantaneous motion state of each object. We address both by grounding MOS in 3D space and time, and propose GMOS, a framework that operates directly on RGB video to produce 3D-aware, temporally fine-grained segmentation of multiple moving objects, alongside a foreground--background variant GMOS-S for faster deployment. To support training and evaluation in this regime, we curate GMOS-2K, a dataset of 2,210 real-world videos with per-object temporal motion annotations drawn from five established Video Object Segmentation (VOS) benchmarks, and formalise MOS-I ("I" for instantaneous), a temporally fine-grained evaluation protocol with three complementary metrics. GMOS achieves state-of-the-art results across MOS, MOS-I, and Unsupervised VOS benchmarks, while running significantly faster than prior multi-object MOS methods and supporting online inference for streaming deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes