HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting
This addresses a specific issue in image generation for applications requiring realistic human depictions, but it is an incremental improvement focused on a narrow domain.
The paper tackles the problem of diffusion models generating malformed human hands in images by introducing HandRefiner, a lightweight post-processing solution that uses conditional inpainting to correct hand shapes and finger counts, resulting in significant quantitative and qualitative improvements in generation quality.
Diffusion models have achieved remarkable success in generating realistic images but suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This difficulty arises from the complex task of learning the physical structure and pose of hands from training images, which involves extensive deformations and occlusions. For correct hand generation, our paper introduces a lightweight post-processing solution called $\textbf{HandRefiner}$. HandRefiner employs a conditional inpainting approach to rectify malformed hands while leaving other parts of the image untouched. We leverage the hand mesh reconstruction model that consistently adheres to the correct number of fingers and hand shape, while also being capable of fitting the desired hand pose in the generated image. Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information. Additionally, we uncover a phase transition phenomenon within ControlNet as we vary the control strength. It enables us to take advantage of more readily available synthetic data without suffering from the domain gap between realistic and synthetic hands. Experiments demonstrate that HandRefiner can significantly improve the generation quality quantitatively and qualitatively. The code is available at https://github.com/wenquanlu/HandRefiner .