Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
This work addresses 3D scene segmentation for computer vision applications, offering a novel method that improves accuracy but is incremental in building on existing Gaussian-based synthesis techniques.
The paper tackles the problem of 3D scene segmentation by introducing Contrastive Gaussian Clustering, which uses 3D Gaussians with segmentation features to generate segmentation masks from any viewpoint, resulting in an 8% improvement in IoU accuracy over state-of-the-art methods.
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before $α$ blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and $α$ blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8\%$ over the state of the art. Code and trained models will be released soon.