CVJan 21, 2025

ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions

arXiv:2501.12173v1h-index: 10
Originality Incremental advance
AI Analysis

This work addresses the need for more sophisticated human image generation in applications like fashion design, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of limited flexibility and precision in human image generation by introducing ComposeAnyone, a controllable layout-to-human generation method that uses decoupled multimodal conditions, resulting in better alignment to layouts, text, and reference images as demonstrated in experiments.

Building on the success of diffusion models, significant advancements have been made in multimodal image generation tasks. Among these, human image generation has emerged as a promising technique, offering the potential to revolutionize the fashion design process. However, existing methods often focus solely on text-to-image or image reference-based human generation, which fails to satisfy the increasingly sophisticated demands. To address the limitations of flexibility and precision in human generation, we introduce ComposeAnyone, a controllable layout-to-human generation method with decoupled multimodal conditions. Specifically, our method allows decoupled control of any part in hand-drawn human layouts using text or reference images, seamlessly integrating them during the generation process. The hand-drawn layout, which utilizes color-blocked geometric shapes such as ellipses and rectangles, can be easily drawn, offering a more flexible and accessible way to define spatial layouts. Additionally, we introduce the ComposeHuman dataset, which provides decoupled text and reference image annotations for different components of each human image, enabling broader applications in human image generation tasks. Extensive experiments on multiple datasets demonstrate that ComposeAnyone generates human images with better alignment to given layouts, text descriptions, and reference images, showcasing its multi-task capability and controllability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes