Non-local Graph Convolutional Network for joint Activity Recognition and Motion Prediction
This work addresses the problem of analyzing human behavior for applications like robotics or surveillance, but it is incremental as it builds on existing methods with hybrid improvements.
The paper tackled joint human motion prediction and activity recognition by proposing a motion context modeling methodology combining graph convolutional and recurrent neural networks, achieving the best prediction capability among baseline LSTM-based methods and comparable performance to state-of-the-art methods on datasets like Human 3.6M.
3D skeleton-based motion prediction and activity recognition are two interwoven tasks in human behaviour analysis. In this work, we propose a motion context modeling methodology that provides a new way to combine the advantages of both graph convolutional neural networks and recurrent neural networks for joint human motion prediction and activity recognition. Our approach is based on using an LSTM encoder-decoder and a non-local feature extraction attention mechanism to model the spatial correlation of human skeleton data and temporal correlation among motion frames. The proposed network can easily include two output branches, one for Activity Recognition and one for Future Motion Prediction, which can be jointly trained for enhanced performance. Experimental results on Human 3.6M, CMU Mocap and NTU RGB-D datasets show that our proposed approach provides the best prediction capability among baseline LSTM-based methods, while achieving comparable performance to other state-of-the-art methods.