CVJun 5, 2025

Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos

arXiv:2506.05546v2h-index: 35CVPR
Originality Incremental advance
AI Analysis

This addresses the challenge of dynamic scene analysis in 3D computer vision for applications like robotics and AR, though it is incremental as it builds on existing 3D models and 2D segmentation methods.

The paper tackles the problem of segmenting moving objects in dynamic egocentric videos by proposing Layered Motion Fusion, which fuses 2D motion segmentation predictions into layered radiance fields with test-time refinement, resulting in segmentation predictions that surpass the 2D baseline by a large margin.

Computer vision is largely based on 2D techniques, with 3D vision still relegated to a relatively narrow subset of applications. However, by building on recent advances in 3D models such as neural radiance fields, some authors have shown that 3D techniques can at last improve outputs extracted from independent 2D views, by fusing them into 3D and denoising them. This is particularly helpful in egocentric videos, where the camera motion is significant, but only under the assumption that the scene itself is static. In fact, as shown in the recent analysis conducted by EPIC Fields, 3D techniques are ineffective when it comes to studying dynamic phenomena, and, in particular, when segmenting moving objects. In this paper, we look into this issue in more detail. First, we propose to improve dynamic segmentation in 3D by fusing motion segmentation predictions from a 2D-based model into layered radiance fields (Layered Motion Fusion). However, the high complexity of long, dynamic videos makes it challenging to capture the underlying geometric structure, and, as a result, hinders the fusion of motion cues into the (incomplete) scene geometry. We address this issue through test-time refinement, which helps the model to focus on specific frames, thereby reducing the data complexity. This results in a synergy between motion fusion and the refinement, and in turn leads to segmentation predictions of the 3D model that surpass the 2D baseline by a large margin. This demonstrates that 3D techniques can enhance 2D analysis even for dynamic phenomena in a challenging and realistic setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes