Learning to Discriminate Information for Online Action Detection
This work addresses the challenge of real-time action recognition in videos for applications like surveillance and human-computer interaction, representing an incremental advance by refining recurrent network architectures.
The paper tackles the problem of online action detection in streaming videos by addressing the issue of irrelevant background and action information, proposing a novel recurrent unit that discriminates relevant information to improve detection accuracy. The method achieves significant performance improvements over state-of-the-art methods on TVSeries and THUMOS-14 datasets.
From a streaming video, online action detection aims to identify actions in the present. For this task, previous methods use recurrent networks to model the temporal sequence of current action frames. However, these methods overlook the fact that an input image sequence includes background and irrelevant actions as well as the action of interest. For online action detection, in this paper, we propose a novel recurrent unit to explicitly discriminate the information relevant to an ongoing action from others. Our unit, named Information Discrimination Unit (IDU), decides whether to accumulate input information based on its relevance to the current action. This enables our recurrent network with IDU to learn a more discriminative representation for identifying ongoing actions. In experiments on two benchmark datasets, TVSeries and THUMOS-14, the proposed method outperforms state-of-the-art methods by a significant margin. Moreover, we demonstrate the effectiveness of our recurrent unit by conducting comprehensive ablation studies.