CVAug 19, 2019

Cross-Enhancement Transform Two-Stream 3D ConvNets for Action Recognition

arXiv:1908.08916v20.002 citations
AI Analysis25

This is an incremental improvement for action recognition in computer vision, addressing variations in human actions across different environments.

The paper tackles action recognition by proposing a Cross-Enhancement Transform Two-Stream 3D ConvNets algorithm that uses a better-performing stream to assist in training another stream, with experiments on UCF-101, HMDB-51, and Kinetics-400 datasets confirming its effectiveness.

Action recognition is an important research topic in computer vision. It is the basic work for visual understanding and has been applied in many fields. Since human actions can vary in different environments, it is difficult to infer actions in completely different states with a same structural model. For this case, we propose a Cross-Enhancement Transform Two-Stream 3D ConvNets algorithm, which considers the action distribution characteristics on the specific dataset. As a teaching model, stream with better performance in both streams is expected to assist in training another stream. In this way, the enhanced-trained stream and teacher stream are combined to infer actions. We implement experiments on the video datasets UCF-101, HMDB-51, and Kinetics-400, and the results confirm the effectiveness of our algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes