CVFeb 20, 2025

Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing

Yoel Levy, David Shavin, Itai Lang, Sagie Benaim

arXiv:2502.14789v16.21 citationsh-index: 16

Originality Incremental advance

AI Analysis

This addresses the need for more flexible 3D feature representation in computer vision, though it is incremental by building on existing 2D-to-3D distillation methods.

The paper tackles the problem of 3D understanding and editing by proposing multiple disentangled feature fields to capture view-dependent and view-independent components, enabling tasks like segmentation and editing of reflective properties with user clicks.

Recent work has demonstrated the ability to leverage or distill pre-trained 2D features obtained using large pre-trained 2D models into 3D features, enabling impressive 3D editing and understanding capabilities using only 2D supervision. Although impressive, models assume that 3D features are captured using a single feature field and often make a simplifying assumption that features are view-independent. In this work, we propose instead to capture 3D features using multiple disentangled feature fields that capture different structural components of 3D features involving view-dependent and view-independent components, which can be learned from 2D feature supervision only. Subsequently, each element can be controlled in isolation, enabling semantic and structural understanding and editing capabilities. For instance, using a user click, one can segment 3D features corresponding to a given object and then segment, edit, or remove their view-dependent (reflective) properties. We evaluate our approach on the task of 3D segmentation and demonstrate a set of novel understanding and editing tasks.

View on arXiv PDF

Similar