CVLGMLMar 6, 2016

Variational methods for Conditional Multimodal Deep Learning

arXiv:1603.01801v222 citations
AI Analysis

This addresses the challenge of effective conditional modality generation in multimodal deep learning, which is incremental as it builds on existing variational and multimodal methods.

The paper tackles the problem of conditional generation between modalities, such as generating faces from attributes, by learning conditional distributions using variational methods. The proposed conditional multimodal autoencoder (CMMA) produces faces that are more representative of the attributes, with qualitative and quantitative improvements over other deep generative models.

In this paper, we address the problem of conditional modality learning, whereby one is interested in generating one modality given the other. While it is straightforward to learn a joint distribution over multiple modalities using a deep multimodal architecture, we observe that such models aren't very effective at conditional generation. Hence, we address the problem by learning conditional distributions between the modalities. We use variational methods for maximizing the corresponding conditional log-likelihood. The resultant deep model, which we refer to as conditional multimodal autoencoder (CMMA), forces the latent representation obtained from a single modality alone to be `close' to the joint representation obtained from multiple modalities. We use the proposed model to generate faces from attributes. We show that the faces generated from attributes using the proposed model, are qualitatively and quantitatively more representative of the attributes from which they were generated, than those obtained by other deep generative models. We also propose a secondary task, whereby the existing faces are modified by modifying the corresponding attributes. We observe that the modifications in face introduced by the proposed model are representative of the corresponding modifications in attributes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes