Zero-shot Synthesis with Group-Supervised Learning
This addresses the challenge of zero-shot synthesis for computer vision applications, offering a novel framework for generating new images from unseen combinations of attributes.
The paper tackled the problem of enabling neural networks to synthesize novel visual objects with different attributes without seeing examples, by proposing Group-Supervised Learning (GSL) to decompose inputs into disentangled representations for recombination, resulting in GZS-Net outperforming state-of-the-art methods on benchmarks.
Visual cognition of primates is superior to that of artificial neural networks in its ability to 'envision' a visual object, even a newly-introduced one, in different attributes including pose, position, color, texture, etc. To aid neural networks to envision objects with different attributes, we propose a family of objective functions, expressed on groups of examples, as a novel learning framework that we term Group-Supervised Learning (GSL). GSL allows us to decompose inputs into a disentangled representation with swappable components, that can be recombined to synthesize new samples. For instance, images of red boats & blue cars can be decomposed and recombined to synthesize novel images of red cars. We propose an implementation based on auto-encoder, termed group-supervised zero-shot synthesis network (GZS-Net) trained with our learning framework, that can produce a high-quality red car even if no such example is witnessed during training. We test our model and learning framework on existing benchmarks, in addition to anew dataset that we open-source. We qualitatively and quantitatively demonstrate that GZS-Net trained with GSL outperforms state-of-the-art methods.