Xiaoming Li

h-index19

4papers

75citations

Novelty38%

AI Score36

Ranked #97,205 of 194,257 authors (top 50%)#32,661 in CV (top 55%)

4 Papers

9.8CVNov 29, 2023Code

When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation

Xiaoming Li, Xinyu Hou, Chen Change Loy

Text-to-image diffusion models have remarkably excelled in producing diverse, high-quality, and photo-realistic images. This advancement has spurred a growing interest in incorporating specific identities into generated content. Most current methods employ an inversion approach to embed a target visual concept into the text embedding space using a single reference image. However, the newly synthesized faces either closely resemble the reference image in terms of facial attributes, such as expression, or exhibit a reduced capacity for identity preservation. Text descriptions intended to guide the facial attributes of the synthesized face may fall short, owing to the intricate entanglement of identity information with identity-irrelevant facial attributes derived from the reference image. To address these issues, we present the novel use of the extended StyleGAN embedding space $\mathcal{W}_+$, to achieve enhanced identity preservation and disentanglement for diffusion models. By aligning this semantically meaningful human face latent space with text-to-image diffusion models, we succeed in maintaining high fidelity in identity preservation, coupled with the capacity for semantic editing. Additionally, we propose new training objectives to balance the influences of both prompt and identity conditions, ensuring that the identity-irrelevant background remains unaffected during facial attribute modifications. Extensive experiments reveal that our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions in diverse settings. Our source code will be available at \url{https://github.com/csxmli2016/w-plus-adapter}.

6.2CVJul 14, 2025Code

RefSTAR: Blind Facial Image Restoration with Reference Selection, Transfer, and Reconstruction

Zhicun Yin, Junjie Chen, Ming Liu et al.

Blind facial image restoration is highly challenging due to unknown complex degradations and the sensitivity of humans to faces. Although existing methods introduce auxiliary information from generative priors or high-quality reference images, they still struggle with identity preservation problems, mainly due to improper feature introduction on detailed textures. In this paper, we focus on effectively incorporating appropriate features from high-quality reference images, presenting a novel blind facial image restoration method that considers reference selection, transfer, and reconstruction (RefSTAR). In terms of selection, we construct a reference selection (RefSel) module. For training the RefSel module, we construct a RefSel-HQ dataset through a mask generation pipeline, which contains annotating masks for 10,000 ground truth-reference pairs. As for the transfer, due to the trivial solution in vanilla cross-attention operations, a feature fusion paradigm is designed to force the features from the reference to be integrated. Finally, we propose a reference image reconstruction mechanism that further ensures the presence of reference image features in the output image. The cycle consistency loss is also redesigned in conjunction with the mask. Extensive experiments on various backbone models demonstrate superior performance, showing better identity preservation ability and reference feature transfer quality. Source code, dataset, and pre-trained models are available at https://github.com/yinzhicun/RefSTAR.

5.8HCAug 9, 2015

Preprint Virtual Reality Based GIS Analysis Platform

Weixi Wang, Zhihan Lv, Xiaoming Li et al.

This is the preprint version of our paper on ICONIP2015. The proposed platform supports the integrated VRGIS functions including 3D spatial analysis functions, 3D visualization for spatial process and serves for 3D globe and digital city. The 3D analysis and visualization of the concerned city massive information are conducted in the platform. The amount of information that can be visualized with this platform is overwhelming, and the GIS based navigational scheme allows to have great flexibility to access the different available data sources.

7.8HCApr 4, 2015

3D visual analysis of seabed on smartphone

Zhihan Lv, Tianyun Su, Xiaoming Li et al.

We create a 'virtual-seabed' platform to realize the 3D visual analysis of seabed on smartphone. The 3D seabed platform is based on a 'section-drilling' model, implementing visualization and analysis of the integrated data of seabed on the 3D browser on smartphone. Some 3D visual analysis functions are developed. This work presents a thorough and interesting way of presenting seabed data on smartphone, which raises many application possibilities. This platform is another practical proof based on our WebVRGIS platform.