CVApr 11, 2024

Gaga: Group Any Gaussians via 3D-aware Memory Bank

Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang

arXiv:2404.07977v324.343 citationsh-index: 35

Originality Incremental advance

AI Analysis

This addresses 3D scene understanding and manipulation for real-world applications, offering a novel approach but appears incremental as it builds on existing segmentation models.

The paper tackles the problem of reconstructing and segmenting open-world 3D scenes from inconsistent 2D masks, achieving robust performance with precise mask label consistency, particularly for sparsely sampled images, and outperforming state-of-the-art methods in evaluations.

We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot class-agnostic segmentation models. Contrasted to prior 3D scene segmentation approaches that rely on video object tracking or contrastive learning methods, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses through a novel 3D-aware memory bank. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot class-agnostic segmentation models, significantly enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as 3D scene understanding and manipulation.

View on arXiv PDF

Similar