ColoristaNet for Photorealistic Video Style Transfer
It addresses the problem of unrealistic stylization in videos for applications like video editing and entertainment, though it appears incremental as it builds on existing methods with specific improvements.
The paper tackles photorealistic video style transfer by proposing ColoristaNet, a self-supervised framework that avoids Gram loss and uses decoupled instance normalization to improve style transfer while maintaining photorealism, achieving better stylization effects compared to state-of-the-art algorithms.
Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.