CVAIDec 16, 2022

Multi-person 3D pose estimation from unlabelled data

arXiv:2212.08731v35 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses the problem of 3D pose estimation for multiple people in scenarios like surveillance or sports, offering a self-supervised approach that is incremental by building on existing deep learning methods.

The paper tackles multi-person 3D pose estimation from unlabeled multi-view RGB data by addressing cross-view person identification and robust 3D reconstruction, using a self-supervised Graph Neural Network and Multilayer Perceptron to avoid the need for large annotated datasets.

Its numerous applications make multi-human 3D pose estimation a remarkably impactful area of research. Nevertheless, assuming a multiple-view system composed of several regular RGB cameras, 3D multi-pose estimation presents several challenges. First of all, each person must be uniquely identified in the different views to separate the 2D information provided by the cameras. Secondly, the 3D pose estimation process from the multi-view 2D information of each person must be robust against noise and potential occlusions in the scenario. In this work, we address these two challenges with the help of deep learning. Specifically, we present a model based on Graph Neural Networks capable of predicting the cross-view correspondence of the people in the scenario along with a Multilayer Perceptron that takes the 2D points to yield the 3D poses of each person. These two models are trained in a self-supervised manner, thus avoiding the need for large datasets with 3D annotations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes