CVApr 23, 2021

Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

arXiv:2104.11536v116.6178 citations

Originality Synthesis-oriented

AI Analysis

It offers an in-depth review to help researchers understand and advance the field, but it is incremental as a survey paper.

This paper provides a comprehensive survey categorizing and analyzing mainstream deep learning approaches for monocular 2D and 3D human pose estimation since 2014, summarizing differences, solutions for challenges, benchmarks, and performance metrics.

Estimation of the human pose from a monocular camera has been an emerging research topic in the computer vision community with many applications. Recently, benefited from the deep learning technologies, a significant amount of research efforts have greatly advanced the monocular human pose estimation both in 2D and 3D areas. Although there have been some works to summarize the different approaches, it still remains challenging for researchers to have an in-depth view of how these approaches work. In this paper, we provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem. We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks. By systematically summarizing the differences and connections between these approaches, we further analyze the solutions for challenging cases, such as the lack of data, the inherent ambiguity between 2D and 3D, and the complex multi-person scenarios. We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches. Finally, we discuss the challenges and give deep thinking of promising directions for future research. We believe this survey will provide the readers with a deep and insightful understanding of monocular human pose estimation.

View on arXiv PDF

Similar