LGMLMay 6, 2021

Generalized Multimodal ELBO

arXiv:2105.02470v2135 citations
AI Analysis

This addresses a long-standing goal in machine learning for learning from co-occurring data types, though it appears incremental as it builds on and generalizes existing ELBO methods.

The paper tackles the problem of multimodal self-supervised generative models, where existing ELBO approximations force a trade-off between semantic coherence and learning joint data distributions, and proposes a generalized ELBO formulation that overcomes this limitation by combining benefits of previous methods without compromises, demonstrating advantages in experiments.

Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes