PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation
This addresses the challenge of accurately estimating 3D poses in everyday interactive scenarios for computer vision applications, representing an incremental improvement over existing methods.
The paper tackles the problem of multi-person monocular 3D pose estimation by exploiting dependencies between interacting individuals, using a recurrent architecture called PI-Net to refine poses, and demonstrates effectiveness by setting new state-of-the-art on the MuPoTS dataset.
Recent literature addressed the monocular 3D pose estimation task very satisfactorily. In these studies, different persons are usually treated as independent pose instances to estimate. However, in many every-day situations, people are interacting, and the pose of an individual depends on the pose of his/her interactees. In this paper, we investigate how to exploit this dependency to enhance current - and possibly future - deep networks for 3D monocular pose estimation. Our pose interacting network, or PI-Net, inputs the initial pose estimates of a variable number of interactees into a recurrent architecture used to refine the pose of the person-of-interest. Evaluating such a method is challenging due to the limited availability of public annotated multi-person 3D human pose datasets. We demonstrate the effectiveness of our method in the MuPoTS dataset, setting the new state-of-the-art on it. Qualitative results on other multi-person datasets (for which 3D pose ground-truth is not available) showcase the proposed PI-Net. PI-Net is implemented in PyTorch and the code will be made available upon acceptance of the paper.