CVLGDec 6, 2018

Tri-axial Self-Attention for Concurrent Activity Recognition

arXiv:1812.02817v1
Originality Incremental advance
AI Analysis

This work addresses the problem of recognizing multiple overlapping activities in video data, which is incremental as it builds on existing attention and transformer methods.

The paper tackles concurrent activity recognition by proposing a tri-axial self-attention system that extracts and models features for individual activities, achieving state-of-the-art or comparable performance on three standard datasets.

We present a system for concurrent activity recognition. To extract features associated with different activities, we propose a feature-to-activity attention that maps the extracted global features to sub-features associated with individual activities. To model the temporal associations of individual activities, we propose a transformer-network encoder that models independent temporal associations for each activity. To make the concurrent activity prediction aware of the potential associations between activities, we propose self-attention with an association mask. Our system achieved state-of-the-art or comparable performance on three commonly used concurrent activity detection datasets. Our visualizations demonstrate that our system is able to locate the important spatial-temporal features for final decision making. We also showed that our system can be applied to general multilabel classification problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes