Towards Part-Based Understanding of RGB-D Scans
This work addresses the problem of finer-grained object understanding for enabling interactions and functional understanding in 3D environments, which is an incremental improvement for robotics and augmented reality.
This paper introduces the task of part-based scene understanding for RGB-D scans, aiming to decompose detected objects into geometric part masks. The proposed method, which leverages an intermediary part graph representation and part priors, significantly outperforms alternative approaches for semantic part completion.
Recent advances in 3D semantic scene understanding have shown impressive progress in 3D instance segmentation, enabling object-level reasoning about 3D scenes; however, a finer-grained understanding is required to enable interactions with objects and their functional understanding. Thus, we propose the task of part-based scene understanding of real-world 3D environments: from an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks, which composed together form the complete geometry of the observed object. We leverage an intermediary part graph representation to enable robust completion as well as building of part priors, which we use to construct the final part mask predictions. Our experiments demonstrate that guiding part understanding through part graph to part prior-based predictions significantly outperforms alternative approaches to the task of semantic part completion.