Learning to Manipulate Individual Objects in an Image
This addresses the challenge of fine-grained image editing for applications like content creation and data augmentation, though it is incremental in improving unsupervised generative models.
The paper tackles the problem of enabling object-centric manipulation in images without requiring object-level annotations by training a generative model with independent and localized latent factors, achieving control over individual objects in synthesized images.
We describe a method to train a generative model with latent factors that are (approximately) independent and localized. This means that perturbing the latent variables affects only local regions of the synthesized image, corresponding to objects. Unlike other unsupervised generative models, ours enables object-centric manipulation, without requiring object-level annotations, or any form of annotation for that matter. The key to our method is the combination of spatial disentanglement, enforced by a Contextual Information Separation loss, and perceptual cycle-consistency, enforced by a loss that penalizes changes in the image partition in response to perturbations of the latent factors. We test our method's ability to allow independent control of spatial and semantic factors of variability on existing datasets and also introduce two new ones that highlight the limitations of current methods.