CVAug 29, 2023

Learning Modulated Transformation in GANs

Ceyuan Yang, Qihang Zhang, Yinghao Xu, Jiapeng Zhu, Yujun Shen, Bo Dai

arXiv:2308.15472v12.81 citationsh-index: 34

Originality Highly original

AI Analysis

This addresses geometric deformation in generative tasks like image and video synthesis, offering a plug-and-play improvement for state-of-the-art frameworks.

The paper tackles the limitation of style-based GANs in modeling geometric variation by introducing a modulated transformation module (MTM) that predicts spatial offsets for variable convolution locations, improving FID on the TaiChi dataset from 21.36 to 13.60.

The success of style-based generators largely benefits from style modulation, which helps take care of the cross-instance variation within data. However, the instance-wise stochasticity is typically introduced via regular convolution, where kernels interact with features at some fixed locations, limiting its capacity for modeling geometric variation. To alleviate this problem, we equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM). This module predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations for different instances, and hence offers the model an additional degree of freedom to handle geometry deformation. Extensive experiments suggest that our approach can be faithfully generalized to various generative tasks, including image generation, 3D-aware image synthesis, and video generation, and get compatible with state-of-the-art frameworks without any hyper-parameter tuning. It is noteworthy that, towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.

View on arXiv PDF

Similar