Multilinear Latent Conditioning for Generating Unseen Attribute Combinations
This addresses a key limitation in conditional generative models for tasks like image synthesis, though it appears incremental as an extension of cVAE.
The paper tackles the problem of deep generative models lacking generalization to unseen attribute combinations, such as generating a smiling woman after only seeing a smiling man, by introducing a multilinear latent conditioning framework that captures multiplicative interactions between attributes, and demonstrates its efficacy on MNIST, Fashion-MNIST, and CelebA datasets.
Deep generative models rely on their inductive bias to facilitate generalization, especially for problems with high dimensional data, like images. However, empirical studies have shown that variational autoencoders (VAE) and generative adversarial networks (GAN) lack the generalization ability that occurs naturally in human perception. For example, humans can visualize a woman smiling after only seeing a smiling man. On the contrary, the standard conditional VAE (cVAE) is unable to generate unseen attribute combinations. To this end, we extend cVAE by introducing a multilinear latent conditioning framework that captures the multiplicative interactions between the attributes. We implement two variants of our model and demonstrate their efficacy on MNIST, Fashion-MNIST and CelebA. Altogether, we design a novel conditioning framework that can be used with any architecture to synthesize unseen attribute combinations.