CVAug 13, 2019

Three Branches: Detecting Actions With Richer Features

arXiv:1908.04519v13 citations
AI Analysis

This work addresses video action analysis for computer vision applications, showing incremental improvements in benchmark challenges.

The paper tackled action recognition and localization by fusing global video, short-term attention, and long-term activity features, achieving a 21.59% error rate on Kinetics and 32.49% mAP on AVA, outperforming prior submissions by over 10% mAP.

We present our three branch solutions for International Challenge on Activity Recognition at CVPR2019. This model seeks to fuse richer information of global video clip, short human attention and long-term human activity into a unified model. We have participated in two tasks: Task A, the Kinetics challenge and Task B, spatio-temporal action localization challenge. For Kinetics, we achieve 21.59% error rate. For the AVA challenge, our final model obtains 32.49% mAP on the test sets, which outperforms all submissions to the AVA challenge at CVPR 2018 for more than 10% mAP. As the future work, we will introduce human activity knowledge, which is a new dataset including key information of human activity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes