HC CVMay 23, 2025

ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization

Yuchen He, Jianbing Lv, Liqi Cheng, Lingyu Meng, Dazhen Deng, Yingcai Wu

arXiv:2505.17555v13 citationsh-index: 14CHI

Originality Incremental advance

AI Analysis

This addresses the data annotation bottleneck for researchers and practitioners in video analysis, though it is incremental as it builds on data programming methods.

The paper tackles the problem of reducing manual annotation effort for Temporal Action Localization (TAL) by proposing ProTAL, a drag-and-link video programming framework that allows users to define key events to generate action labels for unlabeled videos, and it demonstrates effectiveness through a user study.

Temporal Action Localization (TAL) aims to detect the start and end timestamps of actions in a video. However, the training of TAL models requires a substantial amount of manually annotated data. Data programming is an efficient method to create training labels with a series of human-defined labeling functions. However, its application in TAL faces difficulties of defining complex actions in the context of temporal video frames. In this paper, we propose ProTAL, a drag-and-link video programming framework for TAL. ProTAL enables users to define \textbf{key events} by dragging nodes representing body parts and objects and linking them to constrain the relations (direction, distance, etc.). These definitions are used to generate action labels for large-scale unlabelled videos. A semi-supervised method is then employed to train TAL models with such labels. We demonstrate the effectiveness of ProTAL through a usage scenario and a user study, providing insights into designing video programming framework.

View on arXiv PDF

Similar