HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images
This addresses a specific issue for users of text-to-image models by improving hand rendering without requiring model retraining, though it is incremental as it builds on existing diffusion-based editing techniques.
The paper tackled the problem of anatomically incorrect hands in images generated by diffusion models by proposing HandCraft, a method that restores malformed hands using masks and depth images from a parametric model, achieving seamless integration while preserving image integrity as demonstrated through qualitative and quantitative evaluation.
Generative text-to-image models, such as Stable Diffusion, have demonstrated a remarkable ability to generate diverse, high-quality images. However, they are surprisingly inept when it comes to rendering human hands, which are often anatomically incorrect or reside in the "uncanny valley". In this paper, we propose a method HandCraft for restoring such malformed hands. This is achieved by automatically constructing masks and depth images for hands as conditioning signals using a parametric model, allowing a diffusion-based image editor to fix the hand's anatomy and adjust its pose while seamlessly integrating the changes into the original image, preserving pose, color, and style. Our plug-and-play hand restoration solution is compatible with existing pretrained diffusion models, and the restoration process facilitates adoption by eschewing any fine-tuning or training requirements for the diffusion models. We also contribute MalHand datasets that contain generated images with a wide variety of malformed hands in several styles for hand detector training and hand restoration benchmarking, and demonstrate through qualitative and quantitative evaluation that HandCraft not only restores anatomical correctness but also maintains the integrity of the overall image.