CVNov 28, 2022
Realtime Data-Efficient Portrait Stylization Based On Geometric AlignmentXinrui Wang, Zhuoru Li, Xiao Zhou et al.
Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2x training data efficiency and 100x less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.
CVMar 6
Towards High-resolution and Disentangled Reference-based Sketch ColorizationDingkun Yan, Xinrui Wang, Ru Wang et al.
Sketch colorization is a critical task for automating and assisting in the creation of animations and digital illustrations. Previous research identified the primary difficulty as the distribution shift between semantically aligned training data and highly diverse test data, and focused on mitigating the artifacts caused by the distribution shift instead of fundamentally resolving the problem. In this paper, we present a framework that directly minimizes the distribution shift, thereby achieving superior quality, resolution, and controllability of colorization. We propose a dual-branch framework to explicitly model the data distributions of the training process and inference process with a semantic-aligned branch and a semantic-misaligned branch, respectively. A Gram Regularization Loss is applied across the feature maps of both branches, effectively enforcing cross-domain distribution coherence and stability. Furthermore, we adopt an anime-specific Tagger Network to extract fine-grained attributions from reference images and modulate SDXL's conditional encoders to ensure precise control, and a plugin module to enhance texture transfer. Quantitative and qualitative comparisons, alongside user studies, confirm that our method effectively overcomes the distribution shift challenge, establishing State-of-the-Art performance across both quality and controllability metrics. Ablation study reveals the influence of each component.
CVFeb 27, 2025
Image Referenced Sketch Colorization Based on Animation Creation WorkflowDingkun Yan, Xinrui Wang, Zhuoru Li et al.
Sketch colorization plays an important role in animation and digital illustration production tasks. However, existing methods still meet problems in that text-guided methods fail to provide accurate color and style reference, hint-guided methods still involve manual operation, and image-referenced methods are prone to cause artifacts. To address these limitations, we propose a diffusion-based framework inspired by real-world animation production workflows. Our approach leverages the sketch as the spatial guidance and an RGB image as the color reference, and separately extracts foreground and background from the reference image with spatial masks. Particularly, we introduce a split cross-attention mechanism with LoRA (Low-Rank Adaptation) modules. They are trained separately with foreground and background regions to control the corresponding embeddings for keys and values in cross-attention. This design allows the diffusion model to integrate information from foreground and background independently, preventing interference and eliminating the spatial artifacts. During inference, we design switchable inference modes for diverse use scenarios by changing modules activated in the framework. Extensive qualitative and quantitative experiments, along with user studies, demonstrate our advantages over existing methods in generating high-qualigy artifact-free results with geometric mismatched references. Ablation studies further confirm the effectiveness of each component. Codes are available at https://github.com/ tellurion-kanata/colorizeDiffusion.
CVSep 6, 2021
The Animation Transformer: Visual Correspondence via Segment MatchingEvan Casey, Víctor Pérez, Zhuoru Li et al.
Visual correspondence is a fundamental building block on the way to building assistive tools for hand-drawn animation. However, while a large body of work has focused on learning visual correspondences at the pixel-level, few approaches have emerged to learn correspondence at the level of line enclosures (segments) that naturally occur in hand-drawn animation. Exploiting this structure in animation has numerous benefits: it avoids the intractable memory complexity of attending to individual pixels in high resolution images and enables the use of real-world animation datasets that contain correspondence information at the level of per-segment colors. To that end, we propose the Animation Transformer (AnT) which uses a transformer-based architecture to learn the spatial and visual relationships between segments across a sequence of images. AnT enables practical ML-assisted colorization for professional animation workflows and is publicly accessible as a creative tool in Cadmium.
CVJul 13, 2021
Learning Aesthetic Layouts via Visual GuidanceQingyuan Zheng, Zhuoru Li, Adam Bargteil
We explore computational approaches for visual guidance to aid in creating aesthetically pleasing art and graphic design. Our work complements and builds on previous work that developed models for how humans look at images. Our approach comprises three steps. First, we collected a dataset of art masterpieces and labeled the visual fixations with state-of-art vision models. Second, we clustered the visual guidance templates of the art masterpieces with unsupervised learning. Third, we developed a pipeline using generative adversarial networks to learn the principles of visual guidance and that can produce aesthetically pleasing layouts. We show that the aesthetic visual guidance principles can be learned and integrated into a high-dimensional model and can be queried by the features of graphic elements. We evaluate our approach by generating layouts on various drawings and graphic designs. Moreover, our model considers the color and structure of graphic elements when generating layouts. Consequently, we believe our tool, which generates multiple aesthetic layout options in seconds, can help artists create beautiful art and graphic designs.
CVFeb 26, 2020
Learning to Shadow Hand-drawn SketchesQingyuan Zheng, Zhuoru Li, Adam Bargteil
We present a fully automatic method to generate detailed and accurate artistic shadows from pairs of line drawing sketches and lighting directions. We also contribute a new dataset of one thousand examples of pairs of line drawings and shadows that are tagged with lighting directions. Remarkably, the generated shadows quickly communicate the underlying 3D structure of the sketched scene. Consequently, the shadows generated by our approach can be used directly or as an excellent starting point for artists. We demonstrate that the deep learning network we propose takes a hand-drawn sketch, builds a 3D model in latent space, and renders the resulting shadows. The generated shadows respect the hand-drawn lines and underlying 3D space and contain sophisticated and accurate details, such as self-shadowing effects. Moreover, the generated shadows contain artistic effects, such as rim lighting or halos appearing from back lighting, that would be achievable with traditional 3D rendering methods.