Realtime Data-Efficient Portrait Stylization Based On Geometric Alignment
This work addresses the challenge of maintaining geometric consistency in portrait stylization for applications on rare styles and mobile devices, representing a domain-specific incremental improvement.
The paper tackles the problem of portrait stylization by establishing geometric correlations between portraits and style samples to improve consistency and efficiency, achieving 2x training data efficiency, 100x less computational complexity, and real-time inference at 30 FPS on mobile devices.
Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2x training data efficiency and 100x less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.