CVMMAug 7, 2021

Temporal Action Localization Using Gated Recurrent Units

arXiv:2108.03375v25 citations
AI Analysis

It addresses the challenge of accurately localizing actions in videos, which has real-world applications but has not yet achieved acceptable accuracy rates, representing a strong specific gain.

The paper tackles the Temporal Action Localization (TAL) problem by proposing a GRU-based network with novel post-processing methods, achieving a 27.52% mAP at IoU 0.7 on Thumos14, which is 5.12% better than state-of-the-art.

Temporal Action Localization (TAL) task which is to predict the start and end of each action in a video along with the class label of the action has numerous applications in the real world. But due to the complexity of this task, acceptable accuracy rates have not been achieved yet, whereas this is not the case regarding the action recognition task. In this paper, we propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing methods for TAL task. Specifically, we propose a new design for the output layer of the conventionally GRU resulting in the so-called GRU-Split network. Moreover, linear interpolation is used to generate the action proposals with precise start and end times. Finally, to rank the generated proposals appropriately, we use a Learn to Rank (LTR) approach. We evaluated the performance of the proposed method on Thumos14 and ActivityNet-1.3 datasets. Results show the superiority of the performance of the proposed method compared to state-of-the-art. Specifically in the mean Average Precision (mAP) metric at Intersection over Union (IoU) of 0.7 on Thumos14, we get 27.52% accuracy which is 5.12% better than that of state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes