CVMar 5, 2021

Real-time RGBD-based Extended Body Pose Estimation

arXiv:2103.03663v137 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate full-body pose estimation in applications like human-computer interaction, though it is incremental as it builds on existing models and datasets.

The paper tackles real-time 3D human pose estimation from RGB-D data, achieving 30 FPS performance on a single GPU server and outperforming RGB-only state-of-the-art methods while matching the accuracy of slower optimization-based RGB-D solutions.

We present a system for real-time RGBD-based estimation of 3D human pose. We use parametric 3D deformable human mesh model (SMPL-X) as a representation and focus on the real-time estimation of parameters for the body pose, hands pose and facial expression from Kinect Azure RGB-D camera. We train estimators of body pose and facial expression parameters. Both estimators use previously published landmark extractors as input and custom annotated datasets for supervision, while hand pose is estimated directly by a previously published method. We combine the predictions of those estimators into a temporally-smooth human pose. We train the facial expression extractor on a large talking face dataset, which we annotate with facial expression parameters. For the body pose we collect and annotate a dataset of 56 people captured from a rig of 5 Kinect Azure RGB-D cameras and use it together with a large motion capture AMASS dataset. Our RGB-D body pose model outperforms the state-of-the-art RGB-only methods and works on the same level of accuracy compared to a slower RGB-D optimization-based solution. The combined system runs at 30 FPS on a server with a single GPU. The code will be available at https://saic-violet.github.io/rgbd-kinect-pose

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes