LG MLApr 11, 2022

Mixture-of-experts VAEs can disregard variation in surjective multimodal data

Jannik Wolff, Tassilo Klein, Moin Nabi, Rahul G. Krishnan, Shinichi Nakajima

arXiv:2204.05229v13.32 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This addresses a limitation in multimodal learning for applications like healthcare, but it is incremental as it focuses on a specific failure mode rather than a new solution.

The paper tackles the problem of multimodal VAEs failing to capture variability in surjective data, where one modality (e.g., labels) maps to multiple instances in another (e.g., images), and demonstrates this issue theoretically and empirically.

Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational autoencoders (VAEs) that generate several modalities. We consider subjective data, where single datapoints from one modality (such as class labels) describe multiple datapoints from another modality (such as images). We theoretically and empirically demonstrate that multimodal VAEs with a mixture of experts posterior can struggle to capture variability in such surjective data.

View on arXiv PDF

Similar