LGCVMLMay 30, 2017

Generative Models of Visually Grounded Imagination

arXiv:1705.10762v8154 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of visual imagination for AI systems, enabling them to generate images from abstract concepts, which is incremental as it builds on existing VAE frameworks.

The paper tackled the problem of generating images of novel semantic concepts, such as a man with pink hair, by modifying variational auto-encoders with a new training objective and product-of-experts inference network. It introduced evaluation metrics for correctness, coverage, and compositionality, and showed that their method outperformed existing joint image-attribute VAE methods on MNIST-with-attributes and CelebA datasets.

It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before. We call the ability to create images of novel semantic concepts visually grounded imagination. In this paper, we show how we can modify variational auto-encoders to perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network, which can handle partially specified (abstract) concepts in a principled and efficient way. We also propose a set of easy-to-compute evaluation metrics that capture our intuitive notions of what it means to have good visual imagination, namely correctness, coverage, and compositionality (the 3 C's). Finally, we perform a detailed comparison of our method with two existing joint image-attribute VAE methods (the JMVAE method of Suzuki et.al. and the BiVCCA method of Wang et.al.) by applying them to two datasets: the MNIST-with-attributes dataset (which we introduce here), and the CelebA dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes