CVFeb 26, 2018

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

arXiv:1802.09232v2532 citations
AI Analysis

This work addresses the integration of pose estimation and action recognition for computer vision applications, presenting an incremental improvement by combining these tasks into a single efficient architecture.

The paper tackled the problems of 2D/3D pose estimation from images and action recognition from videos by proposing a multitask deep learning framework, achieving state-of-the-art results on four datasets (MPII, Human3.6M, Penn Action, and NTU) with end-to-end optimization leading to significantly higher accuracy.

Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results. Additionally, we demonstrate that optimization from end-to-end leads to significantly higher accuracy than separated learning. The proposed architecture can be trained with data from different categories simultaneously in a seamlessly way. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU) demonstrate the effectiveness of our method on the targeted tasks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes