CVJun 8, 2021

Contrastive Representation Learning for Hand Shape Estimation

arXiv:2106.04324v256 citations
AI Analysis

This is an incremental improvement for hand shape estimation in computer vision, benefiting applications like human-computer interaction.

This work tackled monocular hand shape estimation by extending momentum contrastive learning and introducing a structured hand image dataset called HanCo, resulting in a 4.7% reduction in mesh error and a 3.6% improvement in F-score compared to an ImageNet pretrained baseline.

This work presents improvements in monocular hand shape estimation by building on top of recent advances in unsupervised learning. We extend momentum contrastive learning and contribute a structured collection of hand images, well suited for visual representation learning, which we call HanCo. We find that the representation learned by established contrastive learning methods can be improved significantly by exploiting advanced background removal techniques and multi-view information. These allow us to generate more diverse instance pairs than those obtained by augmentations commonly used in exemplar based approaches. Our method leads to a more suitable representation for the hand shape estimation task and shows a 4.7% reduction in mesh error and a 3.6% improvement in F-score compared to an ImageNet pretrained baseline. We make our benchmark dataset publicly available, to encourage further research into this direction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes