Increasing the Generalisation Capacity of Conditional VAEs
This addresses structured-prediction tasks for machine learning applications, but it appears incremental as it builds on existing conditional VAE frameworks.
The paper tackles the problem of one-to-many mappings in supervised learning by proposing a method to increase the generalization capacity of conditional variational autoencoders, achieving significantly higher generalization capability on datasets like Cornell Robot Grasping, MNIST, and Fashion-MNIST.
We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Grasping dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation capability.