CVAIApr 16, 2019

Disentangling Pose from Appearance in Monochrome Hand Images

arXiv:1904.07528v13 citations
AI Analysis

This addresses the problem of reducing data collection needs for hand pose estimation in computer vision, though it is incremental as it builds on existing disentanglement methods.

The paper tackles the challenge of hand pose estimation from monochrome images by disentangling pose from appearance factors like texture and lighting, enabling synthesis of unseen images and improving pose estimation performance with a self-disentanglement scheme using cycle consistency.

Hand pose estimation from the monocular 2D image is challenging due to the variation in lighting, appearance, and background. While some success has been achieved using deep neural networks, they typically require collecting a large dataset that adequately samples all the axes of variation of hand images. It would, therefore, be useful to find a representation of hand pose which is independent of the image appearance~(like hand texture, lighting, background), so that we can synthesize unseen images by mixing pose-appearance combinations. In this paper, we present a novel technique that disentangles the representation of pose from a complementary appearance factor in 2D monochrome images. We supervise this disentanglement process using a network that learns to generate images of hand using specified pose+appearance features. Unlike previous work, we do not require image pairs with a matching pose; instead, we use the pose annotations already available and introduce a novel use of cycle consistency to ensure orthogonality between the factors. Experimental results show that our self-disentanglement scheme successfully decomposes the hand image into the pose and its complementary appearance features of comparable quality as the method using paired data. Additionally, training the model with extra synthesized images with unseen hand-appearance combinations by re-mixing pose and appearance factors from different images can improve the 2D pose estimation performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes