MEBOW: Monocular Estimation of Body Orientation In the Wild
This work provides a valuable resource for researchers and developers working on human behavior understanding in applications like robotics and autonomous driving, offering a cost-effective way to improve 3D pose estimation, particularly in challenging real-world conditions.
This paper introduces COCO-MEBOW, a large-scale dataset with 130K body orientation labels for 55K images, addressing the challenge of body orientation estimation in the wild. The dataset significantly improves the performance and robustness of human body orientation models and, when used in a novel triple-source training solution, substantially outperforms state-of-the-art dual-source methods for monocular 3D human pose estimation.
Body orientation estimation provides crucial visual cues in many applications, including robotics and autonomous driving. It is particularly desirable when 3-D pose estimation is difficult to infer due to poor image resolution, occlusion or indistinguishable body parts. We present COCO-MEBOW (Monocular Estimation of Body Orientation in the Wild), a new large-scale dataset for orientation estimation from a single in-the-wild image. The body-orientation labels for around 130K human bodies within 55K images from the COCO dataset have been collected using an efficient and high-precision annotation pipeline. We also validated the benefits of the dataset. First, we show that our dataset can substantially improve the performance and the robustness of a human body orientation estimation model, the development of which was previously limited by the scale and diversity of the available training data. Additionally, we present a novel triple-source solution for 3-D human pose estimation, where 3-D pose labels, 2-D pose labels, and our body-orientation labels are all used in joint training. Our model significantly outperforms state-of-the-art dual-source solutions for monocular 3-D human pose estimation, where training only uses 3-D pose labels and 2-D pose labels. This substantiates an important advantage of MEBOW for 3-D human pose estimation, which is particularly appealing because the per-instance labeling cost for body orientations is far less than that for 3-D poses. The work demonstrates high potential of MEBOW in addressing real-world challenges involving understanding human behaviors. Further information of this work is available at https://chenyanwu.github.io/MEBOW/.