LGMLMay 30, 2019

Cross-modal Variational Auto-encoder with Distributed Latent Spaces and Associators

arXiv:1905.12867v17 citations
Originality Incremental advance
AI Analysis

This addresses the problem of associating data across different modalities (e.g., vision and sound) for researchers in multimodal AI, though it appears incremental as it builds on existing variational auto-encoder frameworks.

The paper tackles cross-modal data association by proposing a novel structure using multiple variational auto-encoders and variational associators, inspired by brain associative learning, which successfully associates heterogeneous modal data and is validated on visual and auditory datasets.

In this paper, we propose a novel structure for a cross-modal data association, which is inspired by the recent research on the associative learning structure of the brain. We formulate the cross-modal association in Bayesian inference framework realized by a deep neural network with multiple variational auto-encoders and variational associators. The variational associators transfer the latent spaces between auto-encoders that represent different modalities. The proposed structure successfully associates even heterogeneous modal data and easily incorporates the additional modality to the entire network via the proposed cross-modal associator. Furthermore, the proposed structure can be trained with only a small amount of paired data since auto-encoders can be trained by unsupervised manner. Through experiments, the effectiveness of the proposed structure is validated on various datasets including visual and auditory data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes