CVApr 16, 2020

Asynchronous Interaction Aggregation for Action Detection

arXiv:2004.07485v1133 citationsHas Code
AI Analysis

This work addresses video action detection for computer vision applications, representing an incremental improvement with specific performance gains.

The paper tackles the problem of video action detection by proposing the Asynchronous Interaction Aggregation network (AIA), which integrates multiple interaction types and models long-term interactions dynamically, resulting in a 3.7 mAP gain (12.6% relative improvement) on the AVA dataset compared to a strong baseline.

Understanding interaction is an essential part of video action detection. We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection. There are two key designs in it: one is the Interaction Aggregation structure (IA) adopting a uniform paradigm to model and integrate multiple types of interaction; the other is the Asynchronous Memory Update algorithm (AMU) that enables us to achieve better performance by modeling very long-term interaction dynamically without huge computation cost. We provide empirical evidence to show that our network can gain notable accuracy from the integrative interactions and is easy to train end-to-end. Our method reports the new state-of-the-art performance on AVA dataset, with 3.7 mAP gain (12.6% relative improvement) on validation split comparing to our strong baseline. The results on dataset UCF101-24 and EPIC-Kitchens further illustrate the effectiveness of our approach. Source code will be made public at: https://github.com/MVIG-SJTU/AlphAction .

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes