Action Anticipation with Goal Consistency
This work addresses action anticipation for video analysis applications, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackles short-term action anticipation by predicting upcoming actions one second before they occur, using high-level intent information and a consistency loss to align anticipated actions with goals, achieving state-of-the-art results on Assembly101 and COIN datasets.
In this paper, we address the problem of short-term action anticipation, i.e., we want to predict an upcoming action one second before it happens. We propose to harness high-level intent information to anticipate actions that will take place in the future. To this end, we incorporate an additional goal prediction branch into our model and propose a consistency loss function that encourages the anticipated actions to conform to the high-level goal pursued in the video. In our experiments, we show the effectiveness of the proposed approach and demonstrate that our method achieves state-of-the-art results on two large-scale datasets: Assembly101 and COIN.