Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks
This work addresses unnatural motion tracking for users of low-cost Kinect sensors, but it is incremental as it builds on existing supervised learning methods for sensor data correction.
The paper tackled the problem of unnatural human poses captured by a single Kinect sensor during self-occlusions by using deep recurrent neural networks to correct joint positions and velocities, achieving improved tracking accuracy compared to ground truth from a commercial motion capture system.
Kinect skeleton tracker is able to achieve considerable human body tracking performance in convenient and a low-cost manner. However, The tracker often captures unnatural human poses such as discontinuous and vibrated motions when self-occlusions occur. A majority of approaches tackle this problem by using multiple Kinect sensors in a workspace. Combination of the measurements from different sensors is then conducted in Kalman filter framework or optimization problem is formulated for sensor fusion. However, these methods usually require heuristics to measure reliability of measurements observed from each Kinect sensor. In this paper, we developed a method to improve Kinect skeleton using single Kinect sensor, in which supervised learning technique was employed to correct unnatural tracking motions. Specifically, deep recurrent neural networks were used for improving joint positions and velocities of Kinect skeleton, and three methods were proposed to integrate the refined positions and velocities for further enhancement. Moreover, we suggested a novel measure to evaluate naturalness of captured motions. We evaluated the proposed approach by comparison with the ground truth obtained using a commercial optical maker-based motion capture system.