LGMLJun 15, 2020

Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence

arXiv:2006.08242v396 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of learning from multiple data types for researchers in multimodal machine learning, though it appears incremental as it builds on existing ELBO-based frameworks.

The paper tackled the problem of inefficient training in multimodal generative models by proposing a novel objective function based on Jensen-Shannon divergence, which efficiently approximates unimodal and joint posteriors and is theoretically proven to optimize an ELBO, demonstrating advantages in unsupervised tasks.

Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes