Towards Controllable and Photorealistic Region-wise Image Manipulation
This work addresses the need for adaptive and flexible image editing in generative models, offering a domain-specific solution for region-wise style manipulation.
The paper tackles the problem of controllable and photorealistic region-wise image manipulation by proposing a generative model with auto-encoder architecture that enforces disentanglement between content and style latent representations using a code consistency loss and content alignment loss, resulting in effective region-wise style transfer without extra annotations.
Adaptive and flexible image editing is a desirable function of modern generative models. In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations, making the content and style of generated samples consistent with their corresponding content and style references. The model is also constrained by a content alignment loss to ensure the foreground editing will not interfere background contents. As a result, given interested region masks provided by users, our model supports foreground region-wise style transfer. Specially, our model receives no extra annotations such as semantic labels except for self-supervision. Extensive experiments show the effectiveness of the proposed method and exhibit the flexibility of the proposed model for various applications, including region-wise style editing, latent space interpolation, cross-domain style transfer.