CVOct 15, 2024

It's Just Another Day: Unique Video Captioning by Discriminative Prompting

arXiv:2410.11702v13 citationsh-index: 43Int J Comput Vis
Originality Incremental advance
AI Analysis

This addresses the challenge of retrieving specific video clips in long videos with repetitive content, which is incremental as it builds on existing captioning methods.

The paper tackles the problem of generating unique captions for video clips that share identical descriptions, proposing Captioning by Discriminative Prompting (CDP) to improve text-to-video retrieval, resulting in a 15% increase in R@1 for egocentric videos and 10% for timeloop movies.

Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of unique captioning: Given multiple clips with the same caption, we generate a new caption for each clip that uniquely identifies it. We propose Captioning by Discriminative Prompting (CDP), which predicts a property that can separate identically captioned clips, and use it to generate unique captions. We introduce two benchmarks for unique captioning, based on egocentric footage and timeloop movies - where repeating actions are common. We demonstrate that captions generated by CDP improve text-to-video R@1 by 15% for egocentric videos and 10% in timeloop movies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes