Proactive Multi-Camera Collaboration For 3D Human Pose Estimation
This addresses occlusion and capture space limitations in multi-camera human motion capture for applications like surveillance or robotics, representing an incremental improvement over existing active camera methods.
The paper tackles the problem of 3D human pose estimation in dynamic crowds by proposing a multi-agent reinforcement learning scheme with proactive camera collaboration, resulting in outperformance over fixed and active baselines in various scenarios with different numbers of cameras and humans.
This paper presents a multi-agent reinforcement learning (MARL) scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation in dynamic human crowds. Traditional fixed-viewpoint multi-camera solutions for human motion capture (MoCap) are limited in capture space and susceptible to dynamic occlusions. Active camera approaches proactively control camera poses to find optimal viewpoints for 3D reconstruction. However, current methods still face challenges with credit assignment and environment dynamics. To address these issues, our proposed method introduces a novel Collaborative Triangulation Contribution Reward (CTCR) that improves convergence and alleviates multi-agent credit assignment issues resulting from using 3D reconstruction accuracy as the shared reward. Additionally, we jointly train our model with multiple world dynamics learning tasks to better capture environment dynamics and encourage anticipatory behaviors for occlusion avoidance. We evaluate our proposed method in four photo-realistic UE4 environments to ensure validity and generalizability. Empirical results show that our method outperforms fixed and active baselines in various scenarios with different numbers of cameras and humans.