CVJul 8, 2024

Spatio-Temporal Encoding and Decoding-Based Method for Future Human Activity Skeleton Synthesis

arXiv:2407.05573v12.0h-index: 1

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for early activity prediction in computer vision applications.

The paper tackles the problem of predicting future human activity from observed skeleton data by proposing a spatio-temporal encoding and decoding method, which achieves smaller errors and fewer parameters compared to existing algorithms.

Inferring future activity information based on observed activity data is a crucial step to improve the accuracy of early activity prediction. Traditional methods based on generative adversarial networks(GAN) or joint learning frameworks can achieve good prediction accuracy under low observation ratios, but they usually have high computational costs. In view of this, this paper proposes a spatio-temporal encoding and decoding-based method for future human activity skeleton synthesis. Firstly, algorithms such as time control, discrete cosine transform, and low-pass filtering are used to cut or pad the skeleton sequences. Secondly, the encoder and decoder are responsible for extracting intermediate semantic encoding from observed skeleton sequences and inferring future sequences from the intermediate semantic encoding, respectively. Finally, joint displacement error, velocity error, and acceleration error, three higher-order kinematic features, are used as key components of the loss function to optimize model parameters. Experimental results show that the proposed future skeleton synthesis algorithm performs better than some existing algorithms. It generates skeleton sequences with smaller errors and fewer model parameters, effectively providing future information for early activity prediction.

View on arXiv PDF

Similar