CVAIMar 4, 2024

HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

arXiv:2403.01693v350 citationsh-index: 19CVPR
Originality Incremental advance
AI Analysis

This addresses a specific issue in text-to-image generation for applications requiring realistic human depictions, representing an incremental improvement focused on hand realism.

The paper tackles the problem of generating realistic hands in text-to-image models, which often produce artifacts like irregular poses and incorrect finger counts, by proposing HanDiffuser, a diffusion-based architecture that injects hand embeddings to achieve high-quality hand generation, as demonstrated through quantitative experiments and user studies.

Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands. Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations. To generate images with realistic hands, we propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process. HanDiffuser consists of two components: a Text-to-Hand-Params diffusion model to generate SMPL-Body and MANO-Hand parameters from input text prompts, and a Text-Guided Hand-Params-to-Image diffusion model to synthesize images by conditioning on the prompts and hand parameters generated by the previous component. We incorporate multiple aspects of hand representation, including 3D shapes and joint-level finger positions, orientations and articulations, for robust learning and reliable performance during inference. We conduct extensive quantitative and qualitative experiments and perform user studies to demonstrate the efficacy of our method in generating images with high-quality hands.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes