CVAICLDec 12, 2024

TimeRefine: Temporal Grounding with Time Refining Video LLM

arXiv:2412.09601v213 citationsh-index: 28
AI Analysis

This work addresses video temporal grounding for applications like video retrieval and analysis, but it is incremental as it builds on existing LLM-based methods with a plug-and-play refinement approach.

The paper tackles the challenge of accurately localizing temporal boundaries in videos using Video LLMs by proposing TimeRefine, which reformulates the task as a temporal refining process and incorporates an auxiliary prediction head, resulting in mIoU improvements of 3.6% on ActivityNet and 5.0% on Charades-STA datasets.

Video temporal grounding aims to localize relevant temporal boundaries in a video given a textual prompt. Recent work has focused on enabling Video LLMs to perform video temporal grounding via next-token prediction of temporal timestamps. However, accurately localizing timestamps in videos remains challenging for Video LLMs when relying solely on temporal token prediction. Our proposed TimeRefine addresses this challenge in two ways. First, instead of directly predicting the start and end timestamps, we reformulate the temporal grounding task as a temporal refining task: the model first makes rough predictions and then refines them by predicting offsets to the target segment. This refining process is repeated multiple times, through which the model progressively self-improves its temporal localization accuracy. Second, to enhance the model's temporal perception capabilities, we incorporate an auxiliary prediction head that penalizes the model more if a predicted segment deviates further from the ground truth, thus encouraging the model to make closer and more accurate predictions. Our plug-and-play method can be integrated into most LLM-based temporal grounding approaches. The experimental results demonstrate that TimeRefine achieves 3.6% and 5.0% mIoU improvements on the ActivityNet and Charades-STA datasets, respectively. Code and pretrained models will be released.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes