LG QMMar 2

CoVAE: correlated multimodal generative modeling

arXiv:2603.01965v1

Originality Incremental advance

AI Analysis

This addresses the issue of destroyed joint statistical structure in multimodal data for researchers in generative modeling, though it appears incremental as it builds on existing VAE frameworks.

The paper tackles the problem of multimodal generative modeling by introducing CoVAE, which captures correlations between modalities, and demonstrates accurate cross-modal reconstruction and effective uncertainty quantification on real and synthetic datasets.

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

View on arXiv PDF

Similar