CVDec 9, 2024

Towards Long Video Understanding via Fine-detailed Video Story Generation

arXiv:2412.06182v213 citationsh-index: 14IEEE transactions on circuits and systems for video technology (Print)
AI Analysis

This work addresses long video understanding for applications like surveillance and content retrieval, presenting a novel method with competitive performance.

The paper tackles long video understanding by introducing Fine-Detailed Video Story generation (FDVS), which converts long videos into hierarchical textual representations to address challenges in long-context modeling and redundancy, achieving effectiveness across eight datasets spanning three tasks.

Long video understanding has become a critical task in computer vision, driving advancements across numerous applications from surveillance to content retrieval. Existing video understanding methods suffer from two challenges when dealing with long video understanding: intricate long-context relationship modeling and interference from redundancy. To tackle these challenges, we introduce Fine-Detailed Video Story generation (FDVS), which interprets long videos into detailed textual representations. Specifically, to achieve fine-grained modeling of long-temporal content, we propose a Bottom-up Video Interpretation Mechanism that progressively interprets video content from clips to video. To avoid interference from redundant information in videos, we introduce a Semantic Redundancy Reduction mechanism that removes redundancy at both the visual and textual levels. Our method transforms long videos into hierarchical textual representations that contain multi-granularity information of the video. With these representations, FDVS is applicable to various tasks without any fine-tuning. We evaluate the proposed method across eight datasets spanning three tasks. The performance demonstrates the effectiveness and versatility of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes