InvNeRF-Seg: Fine-Tuning a Pre-Trained NeRF for 3D Object Segmentation
This work addresses the need for efficient 3D object segmentation from 2D images, which is important for applications in robotics and agriculture, but it is incremental as it builds on existing NeRF techniques.
The paper tackles the problem of segmenting 3D scenes reconstructed by NeRF, which is essential for tasks like object counting and scene understanding, by proposing InvNeRF-Seg, a fine-tuning method that achieves superior performance over existing methods on synthetic and real-world datasets.
Neural Radiance Fields (NeRF) have been widely adopted for reconstructing high quality 3D point clouds from 2D RGB images. However, the segmentation of these reconstructed 3D scenes is more essential for downstream tasks such as object counting, size estimation, and scene understanding. While segmentation on raw 3D point clouds using deep learning requires labor intensive and time-consuming manual annotation, directly training NeRF on binary masks also fails due to the absence of color and shading cues essential for geometry learning. We propose Invariant NeRF for Segmentation (InvNeRFSeg), a two step, zero change fine tuning strategy for 3D segmentation. We first train a standard NeRF on RGB images and then fine tune it using 2D segmentation masks without altering either the model architecture or loss function. This approach produces higher quality, cleaner segmented point clouds directly from the refined radiance field with minimal computational overhead or complexity. Field density analysis reveals consistent semantic refinement: densities of object regions increase while background densities are suppressed, ensuring clean and interpretable segmentations. We demonstrate InvNeRFSegs superior performance over both SA3D and FruitNeRF on both synthetic fruit and real world soybean datasets. This approach effectively extends 2D segmentation to high quality 3D segmentation.