PoseFusion2: Simultaneous Background Reconstruction and Human Shape Recovery in Real-time
This addresses the challenge of handling moving humans in SLAM for robotics or AR/VR applications, offering an incremental improvement by integrating human detection and reconstruction into a real-time framework.
The paper tackles the problem of robustly tracking and reconstructing non-rigid human shapes in dynamic environments for SLAM, achieving real-time performance with a system that runs at 26 fps for background reconstruction and up to 10 fps with human pose estimation.
Dynamic environments that include unstructured moving objects pose a hard problem for Simultaneous Localization and Mapping (SLAM) performance. The motion of rigid objects can be typically tracked by exploiting their texture and geometric features. However, humans moving in the scene are often one of the most important, interactive targets - they are very hard to track and reconstruct robustly due to non-rigid shapes. In this work, we present a fast, learning-based human object detector to isolate the dynamic human objects and realise a real-time dense background reconstruction framework. We go further by estimating and reconstructing the human pose and shape. The final output environment maps not only provide the dense static backgrounds but also contain the dynamic human meshes and their trajectories. Our Dynamic SLAM system runs at around 26 frames per second (fps) on GPUs, while additionally turning on accurate human pose estimation can be executed at up to 10 fps.