CLSep 14, 2024

Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM

arXiv:2409.09362v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the problem of event attribution in movies for social media and semantic analysis services, representing an incremental improvement over prior clip-level methods.

The paper tackles the challenge of analyzing causal relationships between events across entire movies, which existing methods struggle with due to limited context in multimodal large language models, and proposes a two-stage prefix-enhanced approach that outperforms state-of-the-art methods on real-world datasets.

The prosperity of social media platforms has raised the urgent demand for semantic-rich services, e.g., event and storyline attribution. However, most existing research focuses on clip-level event understanding, primarily through basic captioning tasks, without analyzing the causes of events across an entire movie. This is a significant challenge, as even advanced multimodal large language models (MLLMs) struggle with extensive multimodal information due to limited context length. To address this issue, we propose a Two-Stage Prefix-Enhanced MLLM (TSPE) approach for event attribution, i.e., connecting associated events with their causal semantics, in movie videos. In the local stage, we introduce an interaction-aware prefix that guides the model to focus on the relevant multimodal information within a single clip, briefly summarizing the single event. Correspondingly, in the global stage, we strengthen the connections between associated events using an inferential knowledge graph, and design an event-aware prefix that directs the model to focus on associated events rather than all preceding clips, resulting in accurate event attribution. Comprehensive evaluations of two real-world datasets demonstrate that our framework outperforms state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes