Weiming Zhao

CV
h-index25
4papers
25citations
Novelty53%
AI Score36

4 Papers

CVApr 22, 2024
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Weili Zeng, Yichao Yan, Qi Zhu et al.

Text-to-image (T2I) customization aims to create images that embody specific visual concepts delineated in textual descriptions. However, existing works still face a main challenge, concept overfitting. To tackle this challenge, we first analyze overfitting, categorizing it into concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities, i.e, backgrounds, layouts, styles. To evaluate the overfitting degree, we further introduce two metrics, i.e, Latent Fisher divergence and Wasserstein metric to measure the distribution changes of non-customized and customized concept respectively. Drawing from the analysis, we propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities, while preserving non-customized knowledge. Remarkably, Infusion achieves this feat with remarkable efficiency, requiring a mere 11KB of trained parameters. Extensive experiments also demonstrate that our approach outperforms state-of-the-art methods in both single and multi-concept customized generation.

CVApr 23, 2024
IPAD: Industrial Process Anomaly Detection Dataset

Jinfan Liu, Yichao Yan, Junjie Li et al.

Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames, and existing large-scale VAD researches primarily focus on road traffic and human activity scenes. In industrial scenes, there are often a variety of unpredictable anomalies, and the VAD method can play a significant role in these scenarios. However, there is a lack of applicable datasets and methods specifically tailored for industrial production scenarios due to concerns regarding privacy and security. To bridge this gap, we propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios. The industrial processes in our dataset are chosen through on-site factory research and discussions with engineers. This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage. Moreover, we annotate the key feature of the industrial process, ie, periodicity. Based on the proposed dataset, we introduce a period memory module and a sliding window inspection mechanism to effectively investigate the periodic information in a basic reconstruction model. Our framework leverages LoRA adapter to explore the effective migration of pretrained models, which are initially trained using synthetic data, into real-world scenarios. Our proposed dataset and method will fill the gap in the field of industrial video anomaly detection and drive the process of video understanding tasks as well as smart factory deployment.

CVMar 1, 2025
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

Xuanchen Li, Jianyu Wang, Yuhao Cheng et al.

Significant progress has been made for speech-driven 3D face animation, but most works focus on learning the motion of mesh/geometry, ignoring the impact of dynamic texture. In this work, we reveal that dynamic texture plays a key role in rendering high-fidelity talking avatars, and introduce a high-resolution 4D dataset \textbf{TexTalk4D}, consisting of 100 minutes of audio-synced scan-level meshes with detailed 8K dynamic textures from 100 subjects. Based on the dataset, we explore the inherent correlation between motion and texture, and propose a diffusion-based framework \textbf{TexTalker} to simultaneously generate facial motions and dynamic textures from speech. Furthermore, we propose a novel pivot-based style injection strategy to capture the complicity of different texture and motion styles, which allows disentangled control. TexTalker, as the first method to generate audio-synced facial motion with dynamic texture, not only outperforms the prior arts in synthesising facial motions, but also produces realistic textures that are consistent with the underlying facial movements. Project page: https://xuanchenli.github.io/TexTalk/.

LGOct 12, 2025
Multi-scale Frequency-Aware Adversarial Network for Parkinson's Disease Assessment Using Wearable Sensors

Weiming Zhao, Xulong Wang, Jun Qi et al.

Severity assessment of Parkinson's disease (PD) using wearable sensors offers an effective, objective basis for clinical management. However, general-purpose time series models often lack pathological specificity in feature extraction, making it difficult to capture subtle signals highly correlated with PD.Furthermore, the temporal sparsity of PD symptoms causes key diagnostic features to be easily "diluted" by traditional aggregation methods, further complicating assessment. To address these issues, we propose the Multi-scale Frequency-Aware Adversarial Multi-Instance Network (MFAM). This model enhances feature specificity through a frequency decomposition module guided by medical prior knowledge. Furthermore, by introducing an attention-based multi-instance learning (MIL) framework, the model can adaptively focus on the most diagnostically valuable sparse segments.We comprehensively validated MFAM on both the public PADS dataset for PD versus differential diagnosis (DD) binary classification and a private dataset for four-class severity assessment. Experimental results demonstrate that MFAM outperforms general-purpose time series models in handling complex clinical time series with specificity, providing a promising solution for automated assessment of PD severity.