Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models
This addresses the problem of generating photorealistic and identity-preserving emotional expressions in images for applications like entertainment or social media, but it is incremental as it builds on existing diffusion and CLIP techniques.
The paper tackles emotion manipulation in 'in-the-wild' images using latent diffusion models, achieving superior image quality and realism with competitive emotion translation results compared to GAN-based methods.
In this paper, we investigate the emotion manipulation capabilities of diffusion models with "in-the-wild" images, a rather unexplored application area relative to the vast and rapidly growing literature for image-to-image translation tasks. Our proposed method encapsulates several pieces of prior work, with the most important being Latent Diffusion models and text-driven manipulation with CLIP latents. We conduct extensive qualitative and quantitative evaluations on AffectNet, demonstrating the superiority of our approach in terms of image quality and realism, while achieving competitive results relative to emotion translation compared to a variety of GAN-based counterparts. Code is released as a publicly available repo.