CVJul 14, 2020

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

arXiv:2007.07053v282 citationsHas Code
AI Analysis

This work addresses the challenge of preserving intrinsic pose information and adapting to view variations for tasks like 3D pose estimation and action recognition, representing an incremental improvement with a novel hybrid method.

The paper tackles the problem of learning a 3D human pose representation by disentangling pose-dependent and view-dependent features using a novel Siamese denoising autoencoder and SeBiReNet, achieving state-of-the-art performance on pose denoising and unsupervised action recognition tasks.

Learning a good 3D human pose representation is important for human pose related tasks, e.g. human 3D pose estimation and action recognition. Within all these problems, preserving the intrinsic pose information and adapting to view variations are two critical issues. In this work, we propose a novel Siamese denoising autoencoder to learn a 3D pose representation by disentangling the pose-dependent and view-dependent feature from the human skeleton data, in a fully unsupervised manner. These two disentangled features are utilized together as the representation of the 3D pose. To consider both the kinematic and geometric dependencies, a sequential bidirectional recursive network (SeBiReNet) is further proposed to model the human skeleton data. Extensive experiments demonstrate that the learned representation 1) preserves the intrinsic information of human pose, 2) shows good transferability across datasets and tasks. Notably, our approach achieves state-of-the-art performance on two inherently different tasks: pose denoising and unsupervised action recognition. Code and models are available at: \url{https://github.com/NIEQiang001/unsupervised-human-pose.git}

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes