Full Resolution Repetition Counting
This work addresses the challenge of accurately counting class-agnostic repetitive actions in videos, which is important for applications like sports analysis or surveillance, but it is incremental as it builds on existing methods by focusing on temporal resolution.
The paper tackles the problem of counting repetitive actions in untrimmed videos by proposing a method that uses full temporal resolution to avoid down-sampling, which often misses repetitions. It achieves better or comparable performance on three public datasets: TransRAC, UCFRep, and QUVA.
Given an untrimmed video, repetitive actions counting aims to estimate the number of repetitions of class-agnostic actions. To handle the various length of videos and repetitive actions, also optimization challenges in end-to-end video model training, down-sampling is commonly utilized in recent state-of-the-art methods, leading to ignorance of several repetitive samples. In this paper, we attempt to understand repetitive actions from a full temporal resolution view, by combining offline feature extraction and temporal convolution networks. The former step enables us to train repetition counting network without down-sampling while preserving all repetition regardless of the video length and action frequency, and the later network models all frames in a flexible and dynamically expanding temporal receptive field to retrieve all repetitions with a global aspect. We experimentally demonstrate that our method achieves better or comparable performance in three public datasets, i.e., TransRAC, UCFRep and QUVA. We expect this work will encourage our community to think about the importance of full temporal resolution.