LGAIFeb 7, 2022

Geometric Multimodal Contrastive Representation Learning

arXiv:2202.03390v469 citations
AI Analysis

This addresses the problem of handling heterogeneous and incomplete multimodal data for researchers and practitioners in machine learning, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the challenge of learning multimodal representations that are robust to missing modalities at test time by introducing a Geometric Multimodal Contrastive (GMC) method, which achieves state-of-the-art performance on prediction and reinforcement learning tasks with missing modality information.

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes