CVAIMLAug 31, 2016

Human Pose Estimation in Space and Time using 3D CNN

arXiv:1609.00036v326 citations
Originality Incremental advance
AI Analysis

It addresses the problem of accurate 3D pose estimation from videos for computer vision applications, but is incremental as it extends existing CNN methods to 3D convolutions.

The paper tackles 3D human pose estimation from monocular RGB videos by using 3D CNNs to encode time as an additional dimension, achieving state-of-the-art performance on the Human3.6M dataset.

This paper explores the capabilities of convolutional neural networks to deal with a task that is easily manageable for humans: perceiving 3D pose of a human body from varying angles. However, in our approach, we are restricted to using a monocular vision system. For this purpose, we apply a convolutional neural network approach on RGB videos and extend it to three dimensional convolutions. This is done via encoding the time dimension in videos as the 3\ts{rd} dimension in convolutional space, and directly regressing to human body joint positions in 3D coordinate space. This research shows the ability of such a network to achieve state-of-the-art performance on the selected Human3.6M dataset, thus demonstrating the possibility of successfully representing temporal data with an additional dimension in the convolutional operation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes