CVJan 15, 2020

Lightweight 3D Human Pose Estimation Network Training Using Teacher-Student Learning

arXiv:2001.05097v144 citations
AI Analysis

This work addresses the problem of efficient and accurate 3D human pose estimation for real-time applications on mobile devices, representing an incremental improvement with a focus on lightweight design.

The paper tackles 3D human pose estimation from a single RGB camera by proposing MoVNect, a lightweight deep neural network that uses teacher-student learning for training and real-time post-processing for stability. It achieves high accuracy and fast inference, demonstrated on the Human3.6M dataset and a mobile application.

We present MoVNect, a lightweight deep neural network to capture 3D human pose using a single RGB camera. To improve the overall performance of the model, we apply the teacher-student learning method based knowledge distillation to 3D human pose estimation. Real-time post-processing makes the CNN output yield temporally stable 3D skeletal information, which can be used in applications directly. We implement a 3D avatar application running on mobile in real-time to demonstrate that our network achieves both high accuracy and fast inference time. Extensive evaluations show the advantages of our lightweight model with the proposed training method over previous 3D pose estimation methods on the Human3.6M dataset and mobile devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes