42.3CVMar 17
Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern GenerationChenggong Hu, Yi Wang, Mengqi Xue et al.
Textile pattern generation (TPG) aims to synthesize fine-grained textile pattern images based on given clothing images. Although previous studies have not explicitly investigated TPG, existing image-to-image models appear to be natural candidates for this task. However, when applied directly, these methods often produce unfaithful results, failing to preserve fine-grained details due to feature confusion between complex textile patterns and the inherent non-rigid texture distortions in clothing images. In this paper, we propose a novel method, SLDDM-TPG, for faithful and high-fidelity TPG. Our method consists of two stages: (1) a latent disentangled network (LDN) that resolves feature confusion in clothing representations and constructs a multi-dimensional, independent clothing feature space; and (2) a semi-supervised latent diffusion model (S-LDM), which receives guidance signals from LDN and generates faithful results through semi-supervised diffusion training, combined with our designed fine-grained alignment strategy. Extensive evaluations show that SLDDM-TPG reduces FID by 4.1 and improves SSIM by up to 0.116 on our CTP-HD dataset, and also demonstrate good generalization on the VITON-HD dataset.
CVMay 8, 2025Code
Diffusion Model Quantization: A ReviewQian Zeng, Chenggong Hu, Mingli Song et al.
Recent success of large text-to-image models has empirically underscored the exceptional performance of diffusion models in generative tasks. To facilitate their efficient deployment on resource-constrained edge devices, model quantization has emerged as a pivotal technique for both compression and acceleration. This survey offers a thorough review of the latest advancements in diffusion model quantization, encapsulating and analyzing the current state of the art in this rapidly advancing domain. First, we provide an overview of the key challenges encountered in the quantization of diffusion models, including those based on U-Net architectures and Diffusion Transformers (DiT). We then present a comprehensive taxonomy of prevalent quantization techniques, engaging in an in-depth discussion of their underlying principles. Subsequently, we perform a meticulous analysis of representative diffusion model quantization schemes from both qualitative and quantitative perspectives. From a quantitative standpoint, we rigorously benchmark a variety of methods using widely recognized datasets, delivering an extensive evaluation of the most recent and impactful research in the field. From a qualitative standpoint, we categorize and synthesize the effects of quantization errors, elucidating these impacts through both visual analysis and trajectory examination. In conclusion, we outline prospective avenues for future research, proposing novel directions for the quantization of generative models in practical applications. The list of related papers, corresponding codes, pre-trained models and comparison results are publicly available at the survey project homepage https://github.com/TaylorJocelyn/Diffusion-Model-Quantization.
AINov 24, 2025
HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model CompanionsShaoyin Ma, Chenggong Hu, Huiqiong Wang et al.
Building effective LLM agents increasingly requires selecting appropriate AI models as tools from large open repositories (e.g., HuggingFace with > 2M models) based on natural language requests. Unlike invoking a fixed set of API tools, repository-scale model selection must handle massive, evolving candidates with incomplete metadata. Existing approaches incorporate full model descriptions into prompts, resulting in prompt bloat, excessive token costs, and limited scalability. To address these issues, we propose HuggingR$^4$, the first framework to recast model selection as an iterative reasoning process rather than one-shot retrieval. By synergistically integrating Reasoning, Retrieval, Refinement, and Reflection, HuggingR$^4$ progressively decomposes user intent, retrieves candidates through multi-round deliberation, refines selections via fine-grained analysis, and validates results through reflection. To facilitate rigorous evaluation, we introduce a large-scale benchmark comprising 14,399 diverse user requests across 37 task categories. Experiments demonstrate that HuggingR$^4$ achieves 92.03% workability and 82.46% reasonability-outperforming current state-of-the-art baselines by 26.51% and 33.25%, respectively, while reducing token consumption by $6.9 \times$.