Self-supervision on Unlabelled OR Data for Multi-person 2D/3D Human Pose Estimation
This provides a viable solution for real-time pose estimation in operating rooms, addressing the lack of annotated data and deployment limitations, though it is incremental as it builds on existing distillation methods.
The paper tackles the problem of 2D/3D human pose estimation in operating rooms by using knowledge distillation with a teacher/student framework on unlabeled data, resulting in a lightweight network that performs on par with a complex teacher network on the MVOR+ dataset.
2D/3D human pose estimation is needed to develop novel intelligent tools for the operating room that can analyze and support the clinical activities. The lack of annotated data and the complexity of state-of-the-art pose estimation approaches limit, however, the deployment of such techniques inside the OR. In this work, we propose to use knowledge distillation in a teacher/student framework to harness the knowledge present in a large-scale non-annotated dataset and in an accurate but complex multi-stage teacher network to train a lightweight network for joint 2D/3D pose estimation. The teacher network also exploits the unlabeled data to generate both hard and soft labels useful in improving the student predictions. The easily deployable network trained using this effective self-supervision strategy performs on par with the teacher network on \emph{MVOR+}, an extension of the public MVOR dataset where all persons have been fully annotated, thus providing a viable solution for real-time 2D/3D human pose estimation in the OR.