Action Completion: A Temporal Model for Moment Detection
This work addresses the challenge of accurately identifying action completion moments in video analysis, which is incremental as it builds on existing moment detection methods with a novel model.
The paper tackles the problem of locating the moment when an action's goal is confidently achieved, known as completion moment detection, by proposing a joint classification-regression recurrent model that integrates frame-level predictions to detect sequence-level completion moments, achieving detection within one second in 89% of tested sequences.
We introduce completion moment detection for actions - the problem of locating the moment of completion, when the action's goal is confidently considered achieved. The paper proposes a joint classification-regression recurrent model that predicts completion from a given frame, and then integrates frame-level contributions to detect sequence-level completion moment. We introduce a recurrent voting node that predicts the frame's relative position of the completion moment by either classification or regression. The method is also capable of detecting incompletion. For example, the method is capable of detecting a missed ball-catch, as well as the moment at which the ball is safely caught. We test the method on 16 actions from three public datasets, covering sports as well as daily actions. Results show that when combining contributions from frames prior to the completion moment as well as frames post completion, the completion moment is detected within one second in 89% of all tested sequences.