ROLGJan 28, 2024

Towards a large-scale fused and labeled dataset of human pose while interacting with robots in shared urban areas

arXiv:2402.10077v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work provides a dataset for improving socially aware robot navigation, but it is incremental as it repurposes existing datasets and applies an existing method (YOLOv7) with a new metric.

The paper addresses the lack of a dataset for human pose estimation in human-robot interactions in shared urban areas by creating a fused and labeled dataset from MOT17 and NCLT, and introduces a new metric called Mean Scaled Joint Error (MSJE) to overcome distance bias in pose estimation, with results showing YOLOv7 achieving MSJE values as low as 3.38 in outdoor scenarios but up to 25.3 in challenging indoor scenes.

Over the last decade, Autonomous Delivery Robots (ADRs) have transformed conventional delivery methods, responding to the growing e-commerce demand. However, the readiness of ADRs to navigate safely among pedestrians in shared urban areas remains an open question. We contend that there are crucial research gaps in understanding their interactions with pedestrians in such environments. Human Pose Estimation is a vital stepping stone for various downstream applications, including pose prediction and socially aware robot path-planning. Yet, the absence of an enriched and pose-labeled dataset capturing human-robot interactions in shared urban areas hinders this objective. In this paper, we bridge this gap by repurposing, fusing, and labeling two datasets, MOT17 and NCLT, focused on pedestrian tracking and Simultaneous Localization and Mapping (SLAM), respectively. The resulting unique dataset represents thousands of real-world indoor and outdoor human-robot interaction scenarios. Leveraging YOLOv7, we obtained human pose visual and numeric outputs and provided ground truth poses using manual annotation. To overcome the distance bias present in the traditional MPJPE metric, this study introduces a novel human pose estimation error metric called Mean Scaled Joint Error (MSJE) by incorporating bounding box dimensions into it. Findings demonstrate that YOLOv7 effectively estimates human pose in both datasets. However, it exhibits weaker performance in specific scenarios, like indoor, crowded scenes with a focused light source, where both MPJPE and MSJE are recorded as 10.89 and 25.3, respectively. In contrast, YOLOv7 performs better in single-person estimation (NCLT seq 2) and outdoor scenarios (MOT17 seq1), achieving MSJE values of 5.29 and 3.38, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes