ReCoNet: Real-time Coherent Video Style Transfer Network
This addresses the challenge of generating coherent and fast video stylization for applications in media and entertainment, though it is incremental as it builds on existing video style transfer methods.
The paper tackled the problem of achieving real-time video style transfer with high temporal consistency and good perceptual quality, proposing ReCoNet which introduces luminance warping and feature-map-level temporal losses, resulting in improved performance as shown in experimental results.
Image style transfer models based on convolutional neural networks usually suffer from high temporal inconsistency when applied to videos. Some video style transfer models have been proposed to improve temporal consistency, yet they fail to guarantee fast processing speed, nice perceptual style quality and high temporal consistency at the same time. In this paper, we propose a novel real-time video style transfer model, ReCoNet, which can generate temporally coherent style transfer videos while maintaining favorable perceptual styles. A novel luminance warping constraint is added to the temporal loss at the output level to capture luminance changes between consecutive frames and increase stylization stability under illumination effects. We also propose a novel feature-map-level temporal loss to further enhance temporal consistency on traceable objects. Experimental results indicate that our model exhibits outstanding performance both qualitatively and quantitatively.