CVAIJul 27, 2023

Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving

arXiv:2307.14889v120 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the challenge of accurate 3D pose estimation in autonomous driving, which is crucial for safety, but it is incremental as it adapts existing techniques to a specific domain.

The paper tackles the problem of 3D human pose estimation for autonomous vehicles by proposing a weakly supervised method using camera and LiDAR fusion, achieving up to ~13% improvement over state-of-the-art results on the Waymo Open Dataset.

Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios. Promising results of 3D HPE have been gained in several domains such as human-computer interaction, robotics, sports and medical analytics, often based on data collected in well-controlled laboratory environments. Nevertheless, the transfer of 3D HPE methods to AVs has received limited research attention, due to the challenges posed by obtaining accurate 3D pose annotations and the limited suitability of data from other domains. We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data. The weakly supervised setting enables training on the target datasets without any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractor and pseudo labels generated from LiDAR to image projections. Our approach outperforms state-of-the-art results by up to $\sim$ 13% on the Waymo Open Dataset in the weakly supervised setting and achieves state-of-the-art results in the supervised setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes