CVJun 28, 2018

Modeling Spatio-Temporal Human Track Structure for Action Localization

Guilhem Chéron, Anton Osokin, Ivan Laptev, Cordelia Schmid

arXiv:1806.11008v13.33 citations

Originality Incremental advance

AI Analysis

It addresses the problem of accurately localizing actions in time and space in videos for applications like video analysis, with incremental improvements over existing methods.

This paper tackles spatio-temporal human action localization in video by proposing a recurrent localization network (RecLNet) that models temporal structure on person tracks, resulting in substantial improvements in localization performance on datasets like UCF101-24 and DALY.

This paper addresses spatio-temporal localization of human actions in video. In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks. Our model is trained to simultaneously recognize and localize action classes in time and is based on two layer gated recurrent units (GRU) applied separately to two streams, i.e. appearance and optical flow streams. When used together with state-of-the-art person detection and tracking, our model is shown to improve substantially spatio-temporal action localization in videos. The gain is shown to be mainly due to improved temporal localization. We evaluate our method on two recent datasets for spatio-temporal action localization, UCF101-24 and DALY, demonstrating a significant improvement of the state of the art.

View on arXiv PDF

Similar