GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
This work addresses the need for enhanced 3D scene understanding in computer vision, though it appears incremental as it builds upon existing NeRF methods.
The paper tackles the problem of synthesizing novel-view images and semantic maps for unseen scenes by introducing GSNeRF, which incorporates semantics into Neural Radiance Fields, resulting in improved performance over prior works.
Utilizing multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a popular research topic in 3D vision. In this work, we introduce a Generalizable Semantic Neural Radiance Field (GSNeRF), which uniquely takes image semantics into the synthesis process so that both novel view images and the associated semantic maps can be produced for unseen scenes. Our GSNeRF is composed of two stages: Semantic Geo-Reasoning and Depth-Guided Visual rendering. The former is able to observe multi-view image inputs to extract semantic and geometry features from a scene. Guided by the resulting image geometry information, the latter performs both image and semantic rendering with improved performances. Our experiments not only confirm that GSNeRF performs favorably against prior works on both novel-view image and semantic segmentation synthesis but the effectiveness of our sampling strategy for visual rendering is further verified.