Fine-Grained Expression Manipulation via Structured Latent Space
This work addresses the challenge of capturing and manipulating fine-grained expression details in facial images, which is incremental as it builds on existing GAN-based methods by incorporating structured latent spaces and continuous labels.
The paper tackles the problem of fine-grained facial expression manipulation by proposing an end-to-end expression-guided generative adversarial network (EGGAN) that uses structured latent codes and continuous expression labels to generate images with expected expressions, achieving the ability to manipulate fine-grained details and generate continuous intermediate expressions.
Fine-grained facial expression manipulation is a challenging problem, as fine-grained expression details are difficult to be captured. Most existing expression manipulation methods resort to discrete expression labels, which mainly edit global expressions and ignore the manipulation of fine details. To tackle this limitation, we propose an end-to-end expression-guided generative adversarial network (EGGAN), which utilizes structured latent codes and continuous expression labels as input to generate images with expected expressions. Specifically, we adopt an adversarial autoencoder to map a source image into a structured latent space. Then, given the source latent code and the target expression label, we employ a conditional GAN to generate a new image with the target expression. Moreover, we introduce a perceptual loss and a multi-scale structural similarity loss to preserve identity and global shape during generation. Extensive experiments show that our method can manipulate fine-grained expressions, and generate continuous intermediate expressions between source and target expressions.