Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization
This addresses video portrait stylization for practical real-world applications, offering an incremental improvement over prior image-based methods.
The paper tackles the problem of generating temporally coherent stylized videos from real human face images, achieving real-time performance with 0.011 seconds per frame latency and 5.6M parameters.
Portrait stylization, which translates a real human face image into an artistically stylized image, has attracted considerable interest and many prior works have shown impressive quality in recent years. However, despite their remarkable performances in the image-level translation tasks, prior methods show unsatisfactory results when they are applied to the video domain. To address the issue, we propose a novel two-stage video translation framework with an objective function which enforces a model to generate a temporally coherent stylized video while preserving context in the source video. Furthermore, our model runs in real-time with the latency of 0.011 seconds per frame and requires only 5.6M parameters, and thus is widely applicable to practical real-world applications.