PhotoWCT$^2$: Compact Autoencoder for Photorealistic Style Transfer Resulting from Blockwise Training and Skip Connections of High-Frequency Residuals
This work addresses the inefficiency of existing models for photorealistic style transfer, enabling faster processing and higher resolution support, though it is incremental in improving model compactness.
The paper tackles the problem of photorealistic style transfer by introducing a compact autoencoder model, PhotoWCT^2, which reduces parameters by 30.3%, supports 4K resolution, and achieves faster stylization while maintaining state-of-the-art stylization strength and photorealism.
Photorealistic style transfer is an image editing task with the goal to modify an image to match the style of another image while ensuring the result looks like a real photograph. A limitation of existing models is that they have many parameters, which in turn prevents their use for larger image resolutions and leads to slower run-times. We introduce two mechanisms that enable our design of a more compact model that we call PhotoWCT$^2$, which preserves state-of-art stylization strength and photorealism. First, we introduce blockwise training to perform coarse-to-fine feature transformations that enable state-of-art stylization strength in a single autoencoder in place of the inefficient cascade of four autoencoders used in PhotoWCT. Second, we introduce skip connections of high-frequency residuals in order to preserve image quality when applying the sequential coarse-to-fine feature transformations. Our PhotoWCT$^2$ model requires fewer parameters (e.g., 30.3\% fewer) while supporting higher resolution images (e.g., 4K) and achieving faster stylization than existing models.