CVLGROAug 22, 2022

SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration

arXiv:2208.10449v218 citationsh-index: 75
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient and accurate 3D scene reconstruction for robotics applications, offering a hybrid approach that combines the benefits of volumetric and surface-based representations.

The paper tackles the Next Best View (NBV) problem in robotics by proposing SCONE, a method that maximizes surface coverage through volumetric integration, achieving scalability to large scenes and handling free camera motion with improved accuracy over existing volumetric methods.

Next Best View computation (NBV) is a long-standing problem in robotics, and consists in identifying the next most informative sensor position(s) for reconstructing a 3D object or scene efficiently and accurately. Like most current methods, we consider NBV prediction from a depth sensor like Lidar systems. Learning-based methods relying on a volumetric representation of the scene are suitable for path planning, but have lower accuracy than methods using a surface-based representation. However, the latter do not scale well with the size of the scene and constrain the camera to a small number of poses. To obtain the advantages of both representations, we show that we can maximize surface metrics by Monte Carlo integration over a volumetric representation. In particular, we propose an approach, SCONE, that relies on two neural modules: The first module predicts occupancy probability in the entire volume of the scene. Given any new camera pose, the second module samples points in the scene based on their occupancy probability and leverages a self-attention mechanism to predict the visibility of the samples. Finally, we integrate the visibility to evaluate the gain in surface coverage for the new camera pose. NBV is selected as the pose that maximizes the gain in total surface coverage. Our method scales to large scenes and handles free camera motion: It takes as input an arbitrarily large point cloud gathered by a depth sensor as well as camera poses to predict NBV. We demonstrate our approach on a novel dataset made of large and complex 3D scenes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes