CV ROMar 5, 2023

MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition

arXiv:2303.02575v217 citationsh-index: 102

AI Analysis

This work solves the problem of robust action recognition in aerial videos for applications like surveillance and monitoring, though it is incremental as it builds on existing models like X3D.

The paper tackles action recognition in UAV videos by addressing occlusion and viewpoint changes, achieving improvements of 18.9% in Top-1 accuracy on UAV-Human, 7.3% on Drone-Action, and 7.16% on NEC Drones over state-of-the-art methods.

We present a novel approach for action recognition in UAV videos. Our formulation is designed to handle occlusion and viewpoint changes caused by the movement of a UAV. We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain. This enables our recognition model to learn from the key features associated with the motion. We also propose a novel frame sampling method that uses joint mutual information to acquire the most informative frame sequence in UAV videos. We have integrated our approach with X3D and evaluated the performance on multiple datasets. In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods on UAV-Human(Li et al., 2021), 7.3% improvement on Drone-Action(Perera et al., 2019), and 7.16% improvement on NEC Drones(Choi et al., 2020).

View on arXiv PDF

Similar