CVDec 12, 2024

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

U of Toronto
arXiv:2412.08879v210 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses the need for automated video repurposing for social media users, though it is incremental as it builds on existing video summarization methods with a new dataset and baseline.

The authors tackled the problem of creating short-form videos from user-generated content by introducing Repurpose-10K, a large-scale dataset with over 10,000 videos and 120,000 annotated clips, and proposed a baseline model using cross-modal fusion to address the video long-to-short task.

The demand for producing short-form videos for sharing on social media platforms has experienced significant growth in recent times. Despite notable advancements in the fields of video summarization and highlight detection, which can create partially usable short films from raw videos, these approaches are often domain-specific and require an in-depth understanding of real-world video content. To tackle this predicament, we propose Repurpose-10K, an extensive dataset comprising over 10,000 videos with more than 120,000 annotated clips aimed at resolving the video long-to-short task. Recognizing the inherent constraints posed by untrained human annotators, which can result in inaccurate annotations for repurposed videos, we propose a two-stage solution to obtain annotations from real-world user-generated content. Furthermore, we offer a baseline model to address this challenging task by integrating audio, visual, and caption aspects through a cross-modal fusion and alignment framework. We aspire for our work to ignite groundbreaking research in the lesser-explored realms of video repurposing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes