Zhiwen Zuo

h-index11

4papers

23citations

Novelty53%

AI Score28

Ranked #149,809 of 194,257 authors (top 77%)#48,913 in CV (top 83%)

4 Papers

6.5CVApr 21, 2024

Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

Zhanjie Zhang, Jiakai Sun, Guangyuan Li et al.

Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality of stylized images. Firstly, we propose Style Consistency Instance Normalization (SCIN), a method to refine the alignment between content and style features. In addition, we have developed an Instance-based Contrastive Learning (ICL) approach designed to understand the relationships among various styles, thereby enhancing the quality of the resulting stylized images. Recognizing that VGG networks are more adept at extracting classification features and need to be better suited for capturing style features, we have also introduced the Perception Encoder (PE) to capture style features. Extensive experiments demonstrate that our proposed method generates high-quality stylized images and effectively prevents artifacts compared with the existing state-of-the-art methods.

3.6CVMar 3, 2025

Fine-Grained Controllable Apparel Showcase Image Generation via Garment-Centric Outpainting

Rong Zhang, Jingnan Wang, Zhiwen Zuo et al.

In this paper, we propose a novel garment-centric outpainting (GCO) framework based on the latent diffusion model (LDM) for fine-grained controllable apparel showcase image generation. The proposed framework aims at customizing a fashion model wearing a given garment via text prompts and facial images. Different from existing methods, our framework takes a garment image segmented from a dressed mannequin or a person as the input, eliminating the need for learning cloth deformation and ensuring faithful preservation of garment details. The proposed framework consists of two stages. In the first stage, we introduce a garment-adaptive pose prediction model that generates diverse poses given the garment. Then, in the next stage, we generate apparel showcase images, conditioned on the garment and the predicted poses, along with specified text prompts and facial images. Notably, a multi-scale appearance customization module (MS-ACM) is designed to allow both overall and fine-grained text-based control over the generated model's appearance. Moreover, we leverage a lightweight feature fusion operation without introducing any extra encoders or modules to integrate multiple conditions, which is more efficient. Extensive experiments validate the superior performance of our framework compared to state-of-the-art methods.

3.3CVAug 8, 2020

Multimodal Image-to-Image Translation via Mutual Information Estimation and Maximization

Zhiwen Zuo, Lei Zhao, Zhizhong Wang et al.

Multimodal image-to-image translation (I2IT) aims to learn a conditional distribution that explores multiple possible images in the target domain given an input image in the source domain. Conditional generative adversarial networks (cGANs) are often adopted for modeling such a conditional distribution. However, cGANs are prone to ignore the latent code and learn a unimodal distribution in conditional image synthesis, which is also known as the mode collapse issue of GANs. To solve the problem, we propose a simple yet effective method that explicitly estimates and maximizes the mutual information between the latent code and the output image in cGANs by using a deep mutual information neural estimator in this paper. Maximizing the mutual information strengthens the statistical dependency between the latent code and the output image, which prevents the generator from ignoring the latent code and encourages cGANs to fully utilize the latent code for synthesizing diverse results. Our method not only provides a new perspective from information theory to improve diversity for I2IT but also achieves disentanglement between the source domain content and the target domain style for free.

0.9CVJan 21, 2019

On Compression of Unsupervised Neural Nets by Pruning Weak Connections

Zhiwen Zuo, Lei Zhao, Liwen Zuo et al.

Unsupervised neural nets such as Restricted Boltzmann Machines(RBMs) and Deep Belif Networks(DBNs), are powerful in automatic feature extraction,unsupervised weight initialization and density estimation. In this paper,we demonstrate that the parameters of these neural nets can be dramatically reduced without affecting their performance. We describe a method to reduce the parameters required by RBM which is the basic building block for deep architectures. Further we propose an unsupervised sparse deep architectures selection algorithm to form sparse deep neural networks.Experimental results show that there is virtually no loss in either generative or discriminative performance.