CVDec 7, 2022

SSDNeRF: Semantic Soft Decomposition of Neural Radiance Fields

Siddhant Ranade, Christoph Lassner, Kai Li, Christian Haene, Shen-Chi Chen, Jean-Charles Bazin, Sofien Bouaziz

arXiv:2212.03406v110.110 citationsh-index: 38

Originality Incremental advance

AI Analysis

This work addresses the need for detailed 3D semantic representations in computer vision, enabling applications like video editing, but it is incremental as it builds upon existing NeRF methods.

The paper tackles the problem of jointly encoding semantic and radiance signals in neural radiance fields (NeRFs) to achieve a soft decomposition of scenes into semantic parts, enabling correct encoding of multiple blending semantic classes and showing state-of-the-art segmentation and reconstruction results on datasets of common objects and selfie videos.

Neural Radiance Fields (NeRFs) encode the radiance in a scene parameterized by the scene's plenoptic function. This is achieved by using an MLP together with a mapping to a higher-dimensional space, and has been proven to capture scenes with a great level of detail. Naturally, the same parameterization can be used to encode additional properties of the scene, beyond just its radiance. A particularly interesting property in this regard is the semantic decomposition of the scene. We introduce a novel technique for semantic soft decomposition of neural radiance fields (named SSDNeRF) which jointly encodes semantic signals in combination with radiance signals of a scene. Our approach provides a soft decomposition of the scene into semantic parts, enabling us to correctly encode multiple semantic classes blending along the same direction -- an impossible feat for existing methods. Not only does this lead to a detailed, 3D semantic representation of the scene, but we also show that the regularizing effects of the MLP used for encoding help to improve the semantic representation. We show state-of-the-art segmentation and reconstruction results on a dataset of common objects and demonstrate how the proposed approach can be applied for high quality temporally consistent video editing and re-compositing on a dataset of casually captured selfie videos.

View on arXiv PDF

Similar