Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation
This addresses the need for efficient and precise 3D interaction tools for tasks like object selection and manipulation in digital 3D environments, representing a strong specific gain rather than a foundational advancement.
The paper tackles 3D interactive segmentation by introducing a simple method that integrates a voxel-based sparse encoder with a lightweight transformer-based decoder, achieving state-of-the-art performance on benchmark datasets like ScanNet and KITTI-360, with substantial improvements in efficiency and precision.
The increasing availability of digital 3D environments, whether through image-based 3D reconstruction, generation, or scans obtained by robots, is driving innovation across various applications. These come with a significant demand for 3D interaction, such as 3D Interactive Segmentation, which is useful for tasks like object selection and manipulation. Additionally, there is a persistent need for solutions that are efficient, precise, and performing well across diverse settings, particularly in unseen environments and with unfamiliar objects. In this work, we introduce a 3D interactive segmentation method that consistently surpasses previous state-of-the-art techniques on both in-domain and out-of-domain datasets. Our simple approach integrates a voxel-based sparse encoder with a lightweight transformer-based decoder that implements implicit click fusion, achieving superior performance and maximizing efficiency. Our method demonstrates substantial improvements on benchmark datasets, including ScanNet, ScanNet++, S3DIS, and KITTI-360, and also on unseen geometric distributions such as the ones obtained by Gaussian Splatting. The project web-page is available at https://simonelli-andrea.github.io/easy3d.