FurryGAN: High Quality Foreground-aware Image Synthesis
This work addresses a challenging problem in image synthesis for computer vision applications, but it appears incremental as it builds on existing masked blending approaches with specific improvements.
The paper tackles the problem of foreground-aware image synthesis, which involves generating images and their foreground masks, by addressing the trivial solution issue where masks become full or empty. The result is FurryGAN, a method that produces realistic images with detailed alpha masks covering hair, fur, and whiskers in an unsupervised manner.
Foreground-aware image synthesis aims to generate images as well as their foreground masks. A common approach is to formulate an image as an masked blending of a foreground image and a background image. It is a challenging problem because it is prone to reach the trivial solution where either image overwhelms the other, i.e., the masks become completely full or empty, and the foreground and background are not meaningfully separated. We present FurryGAN with three key components: 1) imposing both the foreground image and the composite image to be realistic, 2) designing a mask as a combination of coarse and fine masks, and 3) guiding the generator by an auxiliary mask predictor in the discriminator. Our method produces realistic images with remarkably detailed alpha masks which cover hair, fur, and whiskers in a fully unsupervised manner.