CVAIGRMar 21, 2022

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

ByteDance
arXiv:2203.10821v242 citationsh-index: 38
Originality Incremental advance
AI Analysis

This work addresses the challenge of 3D scene reconstruction from 2D semantic inputs for applications in image translation and manipulation, representing an incremental advance in NeRF-based generative models.

The paper tackles the problem of reconstructing a 3D scene from a single-view semantic mask by introducing the Sem2NeRF framework, which encodes the mask into a latent code for a pre-trained decoder and integrates a region-aware learning strategy, outperforming strong baselines on two benchmark datasets.

Image translation and manipulation have gain increasing attention along with the rapid development of deep generative models. Although existing approaches have brought impressive results, they mainly operated in 2D space. In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input. To kick-off this novel task, we propose the Sem2NeRF framework. In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pre-trained decoder. To further improve the accuracy of the mapping, we integrate a new region-aware learning strategy into the design of both the encoder and the decoder. We verify the efficacy of the proposed Sem2NeRF and demonstrate that it outperforms several strong baselines on two benchmark datasets. Code and video are available at https://donydchen.github.io/sem2nerf/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes