Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
This work addresses the need for high-quality bare hand appearance data for downstream learning tasks in computer vision, offering a novel solution to marker-induced degradations.
The paper tackles the problem of recovering bare hand appearance from marker-degraded images by proposing a semi-supervised framework that disentangles hand structure and uses dual adversarial discrimination, achieving robust photo-realistic recovery on diverse datasets.
Enormous hand images with reliable annotations are collected through marker-based MoCap. Unfortunately, degradations caused by markers limit their application in hand appearance reconstruction. A clear appearance recovery insight is an image-to-image translation trained with unpaired data. However, most frameworks fail because there exists structure inconsistency from a degraded hand to a bare one. The core of our approach is to first disentangle the bare hand structure from those degraded images and then wrap the appearance to this structure with a dual adversarial discrimination (DAD) scheme. Both modules take full advantage of the semi-supervised learning paradigm: The structure disentanglement benefits from the modeling ability of ViT, and the translator is enhanced by the dual discrimination on both translation processes and translation results. Comprehensive evaluations have been conducted to prove that our framework can robustly recover photo-realistic hand appearance from diverse marker-contained and even object-occluded datasets. It provides a novel avenue to acquire bare hand appearance data for other downstream learning problems.The codes will be publicly available at https://www.yangangwang.com