Incremental Tube Construction for Human Action Detection
It addresses the need for efficient, real-time action detection in dynamic environments, offering a solution for applications requiring immediate processing, though it builds incrementally on existing methods.
The paper tackles the problem of real-time action detection in videos for online applications like human-robot interaction, introducing an algorithm that jointly labels and associates actions to incrementally construct action tubes, achieving superior online accuracy and speed of 2.2ms per frame compared to offline state-of-the-art systems.
Current state-of-the-art action detection systems are tailored for offline batch-processing applications. However, for online applications like human-robot interaction, current systems fall short, either because they only detect one action per video, or because they assume that the entire video is available ahead of time. In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging action videos in which different action categories occur concurrently. In contrast to previous methods, we solve the detection-window association and action labelling problems jointly in a single pass. We demonstrate superior online association accuracy and speed (2.2ms per frame) as compared to the current state-of-the-art offline systems. We further demonstrate that the entire action detection pipeline can easily be made to work effectively in real-time using our action tube construction algorithm.