CVApr 2, 2018

SyncGAN: Synchronize the Latent Space of Cross-modal Generative Adversarial Networks

arXiv:1804.00410v12 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of generating aligned data across different modalities, such as for multimedia applications, though it appears incremental by building on existing GAN frameworks.

The authors tackled the problem of cross-modal generation by proposing SyncGAN, which learns a synchronous latent space for heterogeneous data like images and sounds, enabling generation of paired data from identical noise and achieving semi-supervised learning.

Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. Most existing methods of conditional based cross-modal GANs adopt the strategy of one-directional transfer and have achieved preliminary success on text-to-image transfer. Instead of learning the transfer between different modalities, we aim to learn a synchronous latent space representing the cross-modal common concept. A novel network component named synchronizer is proposed in this work to judge whether the paired data is synchronous/corresponding or not, which can constrain the latent space of generators in the GANs. Our GAN model, named as SyncGAN, can successfully generate synchronous data (e.g., a pair of image and sound) from identical random noise. For transforming data from one modality to another, we recover the latent code by inverting the mappings of a generator and use it to generate data of different modality. In addition, the proposed model can achieve semi-supervised learning, which makes our model more flexible for practical applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes