CVJan 11, 2023

Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

arXiv:2301.04612v26 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the challenge of integrating multi-modal data for 3D shape analysis, which is incremental in nature.

The paper tackled the problem of learning latent representations for 3D volumetric shapes by proposing a self-supervised method that combines generative and contrastive learning with a dynamic switching approach, resulting in improved reconstruction and classification performance.

We propose a combined generative and contrastive neural architecture for learning latent representations of 3D volumetric shapes. The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape. The main idea is to combine a contrastive loss between the resulting latent representations with an additional reconstruction loss. That helps to avoid collapsing the latent representations as a trivial solution for minimizing the contrastive loss. A novel dynamic switching approach is used to cross-train two encoders with a shared decoder. The switching approach also enables the stop gradient operation on a random branch. Further classification experiments show that the latent representations learned with our self-supervised method integrate more useful information from the additional input data implicitly, thus leading to better reconstruction and classification performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes