MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation
This addresses the problem of limited labeled data for fine-grained 3D segmentation, offering a hybrid approach that combines 2D and 3D techniques, though it is incremental as it builds on existing self-supervised and contrastive learning methods.
The paper tackled fine-grained 3D shape segmentation by using self-supervised learning on 2D multi-view renderings to learn dense correspondences, resulting in improved performance over state-of-the-art methods on textured and untextured datasets, with greater gains when training with sparse views or on textured shapes.
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. This is inspired by the observation that view-based surface representations are more effective at modeling high-resolution surface details and texture than their 3D counterparts based on point clouds or voxel occupancy. Specifically, given a 3D shape, we render it from multiple views, and set up a dense correspondence learning task within the contrastive learning framework. As a result, the learned 2D representations are view-invariant and geometrically consistent, leading to better generalization when trained on a limited number of labeled shapes compared to alternatives that utilize self-supervision in 2D or 3D alone. Experiments on textured (RenderPeople) and untextured (PartNet) 3D datasets show that our method outperforms state-of-the-art alternatives in fine-grained part segmentation. The improvements over baselines are greater when only a sparse set of views is available for training or when shapes are textured, indicating that MvDeCor benefits from both 2D processing and 3D geometric reasoning.