CVSep 17, 2021

Unsupervised View-Invariant Human Posture Representation

Faegheh Sardari, Björn Ommer, Majid Mirmehdi

arXiv:2109.08730v23.74 citations

Originality Highly original

AI Analysis

This addresses the challenge of view-invariant human posture analysis for applications like action recognition in real-world scenarios where 3D data is hard to obtain, offering a novel unsupervised approach.

The paper tackles the problem of learning view-invariant human pose representations without 3D skeleton data by proposing an unsupervised method that uses 2D images, achieving significant improvements in cross-view action classification accuracy on NTU RGB+D and enabling the first unsupervised results on the QMAR dataset.

Most recent view-invariant action recognition and performance assessment approaches rely on a large amount of annotated 3D skeleton data to extract view-invariant features. However, acquiring 3D skeleton data can be cumbersome, if not impractical, in in-the-wild scenarios. To overcome this problem, we present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image without using 3D joint data. Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames from different viewpoints and their equivariant properties between augmented frames from the same viewpoint. We evaluate the learned view-invariant pose representations for two downstream tasks. We perform comparative experiments that show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on NTU RGB+D by a significant margin, on both RGB and depth images. We also show the efficiency of transferring the learned representations from NTU RGB+D to obtain the first ever unsupervised cross-view and cross-subject rank correlation results on the multi-view human movement quality dataset, QMAR, and marginally improve on the-state-of-the-art supervised results for this dataset. We also carry out ablation studies to examine the contributions of the different components of our proposed network.

View on arXiv PDF

Similar