CVSep 13, 2023

TransNet: A Transfer Learning-Based Network for Human Action Recognition

arXiv:2309.06951v12.81 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses efficiency and effectiveness challenges in human action recognition for computer vision applications, representing an incremental improvement through transfer learning integration.

The paper tackles the problem of complex structures and lengthy training times in human action recognition models by proposing TransNet, a simple end-to-end deep learning architecture that decomposes 3D-CNNs into 2D- and 1D-CNNs, achieving superior performance in flexibility, model complexity, training speed, and classification accuracy compared to state-of-the-art models.

Human action recognition (HAR) is a high-level and significant research area in computer vision due to its ubiquitous applications. The main limitations of the current HAR models are their complex structures and lengthy training time. In this paper, we propose a simple yet versatile and effective end-to-end deep learning architecture, coined as TransNet, for HAR. TransNet decomposes the complex 3D-CNNs into 2D- and 1D-CNNs, where the 2D- and 1D-CNN components extract spatial features and temporal patterns in videos, respectively. Benefiting from its concise architecture, TransNet is ideally compatible with any pretrained state-of-the-art 2D-CNN models in other fields, being transferred to serve the HAR task. In other words, it naturally leverages the power and success of transfer learning for HAR, bringing huge advantages in terms of efficiency and effectiveness. Extensive experimental results and the comparison with the state-of-the-art models demonstrate the superior performance of the proposed TransNet in HAR in terms of flexibility, model complexity, training speed and classification accuracy.

View on arXiv PDF

Similar