Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding
This addresses the need for flexible semantic representation in robotics to handle unknown objects, though it appears incremental as it builds on existing neural field and SLAM methods.
The paper tackles the problem of real-time, open set scene understanding for robotics by fusing learned features into a 3D neural field during SLAM, enabling robust segmentation of novel objects with minimal human labeling.
General scene understanding for robotics requires flexible semantic representation, so that novel objects and structures which may not have been known at training time can be identified, segmented and grouped. We present an algorithm which fuses general learned features from a standard pre-trained network into a highly efficient 3D geometric neural field representation during real-time SLAM. The fused 3D feature maps inherit the coherence of the neural field's geometry representation. This means that tiny amounts of human labelling interacting at runtime enable objects or even parts of objects to be robustly and accurately segmented in an open set manner.