CVSep 19, 2022

NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

Zhiwen Fan, Peihao Wang, Yifan Jiang, Xinyu Gong, Dejia Xu, Zhangyang Wang

arXiv:2209.08776v626.181 citationsh-index: 62Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of expensive manual segmentation for downstream applications in 3D scene understanding, offering a self-supervised solution that is incremental but improves upon existing methods.

The paper tackles the problem of object segmentation in complex real-world scenes without per-view annotations by proposing NeRF-SOS, a self-supervised framework that integrates segmentation with neural radiance fields, achieving superior performance over 2D-based baselines and finer masks than supervised methods on datasets like LLFF, Tank & Temple, and BlendedMVS.

Neural volumetric representations have shown the potential that Multi-layer Perceptrons (MLPs) can be optimized with multi-view calibrated images to represent scene geometry and appearance, without explicit 3D supervision. Object segmentation can enrich many downstream applications based on the learned radiance field. However, introducing hand-crafted segmentation to define regions of interest in a complex real-world scene is non-trivial and expensive as it acquires per view annotation. This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes. Our framework, called NeRF with Self-supervised Object Segmentation NeRF-SOS, couples object segmentation and neural radiance field to segment objects in any view within a scene. By proposing a novel collaborative contrastive loss in both appearance and geometry levels, NeRF-SOS encourages NeRF models to distill compact geometry-aware segmentation clusters from their density fields and the self-supervised pre-trained 2D visual features. The self-supervised object segmentation framework can be applied to various NeRF models that both lead to photo-realistic rendering results and convincing segmentation maps for both indoor and outdoor scenarios. Extensive results on the LLFF, Tank & Temple, and BlendedMVS datasets validate the effectiveness of NeRF-SOS. It consistently surpasses other 2D-based self-supervised baselines and predicts finer semantics masks than existing supervised counterparts. Please refer to the video on our project page for more details:https://zhiwenfan.github.io/NeRF-SOS.

View on arXiv PDF Code

Similar