CVMar 22, 2023

FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models

Jianglong Ye, Naiyan Wang, Xiaolong Wang

arXiv:2303.12786v122.857 citationsh-index: 43Has Code

Originality Highly original

AI Analysis

This work addresses the need for generalizable 3D semantic feature extraction in computer vision, offering a novel approach that bridges 2D foundation models with 3D neural rendering for downstream tasks.

The paper tackles the problem of extending generalizable Neural Radiance Fields (NeRFs) beyond novel view synthesis to semantic understanding tasks by proposing FeatureNeRF, which distills pre-trained vision foundation models into 3D space, enabling applications like 2D/3D semantic keypoint transfer and object part segmentation.

Recent works on generalizable NeRFs have shown promising results on novel view synthesis from single or few images. However, such models have rarely been applied on other downstream tasks beyond synthesis such as semantic understanding and parsing. In this paper, we propose a novel framework named FeatureNeRF to learn generalizable NeRFs by distilling pre-trained vision foundation models (e.g., DINO, Latent Diffusion). FeatureNeRF leverages 2D pre-trained foundation models to 3D space via neural rendering, and then extract deep features for 3D query points from NeRF MLPs. Consequently, it allows to map 2D images to continuous 3D semantic feature volumes, which can be used for various downstream tasks. We evaluate FeatureNeRF on tasks of 2D/3D semantic keypoint transfer and 2D/3D object part segmentation. Our extensive experiments demonstrate the effectiveness of FeatureNeRF as a generalizable 3D semantic feature extractor. Our project page is available at https://jianglongye.com/featurenerf/ .

View on arXiv PDF Code

Similar