A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions
This addresses a domain-specific problem for researchers in video understanding by offering a tool to reduce time and effort in customizing spatio-temporal action datasets, but it is incremental as it builds on existing detection and tracking methods.
The paper tackles the problem of limited application of existing large-scale spatio-temporal action datasets in specific fields and the lack of public tools for creating such datasets, proposing a method that uses ffmpeg, yolov5, and deep sort to generate annotation files for custom datasets, though no concrete performance numbers are provided.
Spatio-temporal action detection is an important and challenging problem in video understanding. However, the application of the existing large-scale spatio-temporal action datasets in specific fields is limited, and there is currently no public tool for making spatio-temporal action datasets, it takes a lot of time and effort for researchers to customize the spatio-temporal action datasets, so we propose a multi-Person video dataset Annotation Method of spatio-temporally actions.First, we use ffmpeg to crop the videos and frame the videos; then use yolov5 to detect human in the video frame, and then use deep sort to detect the ID of the human in the video frame. By processing the detection results of yolov5 and deep sort, we can get the annotation file of the spatio-temporal action dataset to complete the work of customizing the spatio-temporal action dataset. https://github.com/Whiffe/Custom-ava-dataset_Custom-Spatio-Temporally-Action-Video-Dataset