Miguel Rio

CVNov 4, 2021

Skeleton-Split Framework using Spatial Temporal Graph Convolutional Networks for Action Recogntion

Motasem Alsawadi, Miguel Rio

There has been a dramatic increase in the volume of videos and their related content uploaded to the internet. Accordingly, the need for efficient algorithms to analyse this vast amount of data has attracted significant research interest. An action recognition system based upon human body motions has been proven to interpret videos contents accurately. This work aims to recognize activities of daily living using the ST-GCN model, providing a comparison between four different partitioning strategies: spatial configuration partitioning, full distance split, connection split, and index split. To achieve this aim, we present the first implementation of the ST-GCN framework upon the HMDB-51 dataset. We have achieved 48.88 % top-1 accuracy by using the connection split partitioning approach. Through experimental simulation, we show that our proposals have achieved the highest accuracy performance on the UCF-101 dataset using the ST-GCN framework than the state-of-the-art approach. Finally, accuracy of 73.25 % top-1 is achieved by using the index split partitioning strategy.

CVAug 3, 2021

Skeleton Split Strategies for Spatial Temporal Graph Convolution Networks

Motasem S. Alsawadi, Miguel Rio

A skeleton representation of the human body has been proven to be effective for this task. The skeletons are presented in graphs form-like. However, the topology of a graph is not structured like Euclidean-based data. Therefore, a new set of methods to perform the convolution operation upon the skeleton graph is presented. Our proposal is based upon the ST-GCN framework proposed by Yan et al. [1]. In this study, we present an improved set of label mapping methods for the ST-GCN framework. We introduce three split processes (full distance split, connection split, and index split) as an alternative approach for the convolution operation. To evaluate the performance, the experiments presented in this study have been trained using two benchmark datasets: NTU-RGB+D and Kinetics. Our results indicate that all of our split processes outperform the previous partition strategies and are more stable during training without using the edge importance weighting additional training parameter. Therefore, our proposal can provide a more realistic solution for real-time applications centred on daily living recognition systems activities for indoor environments.

Miguel Rio

2 Papers