CVROMar 2, 2023

AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

arXiv:2303.01589v119 citationsh-index: 102
Originality Incremental advance
AI Analysis

This addresses action recognition in UAV-captured videos for applications like robotics and drones, with incremental improvements over existing methods.

The paper tackles aerial video action recognition by introducing a method with auto zoom and temporal reasoning, achieving improvements of 6.1-7.4% on RoCoG-v2, 8.3-10.4% on UAV-Human, and 3.2% on Drone Action datasets over state-of-the-art.

We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also present an efficient temporal reasoning algorithm to capture the action information along the spatial and temporal domains within a controllable computational cost. Our approach has been implemented and evaluated both on the desktop with high-end GPUs and on the low power Robotics RB5 Platform for robots and drones. In practice, we achieve 6.1-7.4% improvement over SOTA in Top-1 accuracy on the RoCoG-v2 dataset, 8.3-10.4% improvement on the UAV-Human dataset and 3.2% improvement on the Drone Action dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes