Focal Frequency Loss for Image Reconstruction and Synthesis
This work provides an incremental improvement for researchers and practitioners working on image reconstruction and synthesis by enhancing the quality of generative models.
This paper addresses the gap between real and generated images, particularly in the frequency domain, by proposing a novel focal frequency loss. This loss adaptively focuses on hard-to-synthesize frequency components, improving the perceptual quality and quantitative performance of models like VAE, pix2pix, SPADE, and StyleGAN2.
Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize by down-weighting the easy ones. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent bias of neural networks. We demonstrate the versatility and effectiveness of focal frequency loss to improve popular models, such as VAE, pix2pix, and SPADE, in both perceptual quality and quantitative performance. We further show its potential on StyleGAN2.