CVAINov 27, 2024

An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition

arXiv:2411.18002v1h-index: 3APSIPA
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in egocentric action recognition for computer vision applications, but it is incremental as it builds on existing two-stream models with modifications.

The paper tackled the high computational cost of traditional two-stream networks for human action recognition by introducing a representation flow algorithm to replace the optical flow branch, achieving comparable or slightly improved accuracy (e.g., 0.84% higher on HMDB) and drastically reducing prediction runtimes (e.g., from 203.9958s to 0.1459s on HMDB).

With the rapid advancements in deep learning, computer vision tasks have seen significant improvements, making two-stream neural networks a popular focus for video based action recognition. Traditional models using RGB and optical flow streams achieve strong performance but at a high computational cost. To address this, we introduce a representation flow algorithm to replace the optical flow branch in the egocentric action recognition model, enabling end-to-end training while reducing computational cost and prediction time. Our model, designed for egocentric action recognition, uses class activation maps (CAMs) to improve accuracy and ConvLSTM for spatio temporal encoding with spatial attention. When evaluated on the GTEA61, EGTEA GAZE+, and HMDB datasets, our model matches the accuracy of the original model on GTEA61 and exceeds it by 0.65% and 0.84% on EGTEA GAZE+ and HMDB, respectively. Prediction runtimes are significantly reduced to 0.1881s, 0.1503s, and 0.1459s, compared to the original model's 101.6795s, 25.3799s, and 203.9958s. Ablation studies were also conducted to study the impact of different parameters on model performance. Keywords: two-stream, egocentric, action recognition, CAM, representation flow, CAM, ConvLSTM

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes