Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
This addresses the challenge of learning from multiple data types for researchers in multimodal machine learning, though it appears incremental as it builds on existing ELBO-based frameworks.
The paper tackled the problem of inefficient training in multimodal generative models by proposing a novel objective function based on Jensen-Shannon divergence, which efficiently approximates unimodal and joint posteriors and is theoretically proven to optimize an ELBO, demonstrating advantages in unsupervised tasks.
Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.