CVMay 22, 2024

GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval

arXiv:2405.13824v112 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of retrieving untrimmed videos with relevant moments from text queries for applications in video search and analysis, representing an incremental advancement over prior methods.

The paper tackles the problem of partially relevant video retrieval (PRVR) by proposing GMMFormer v2, an uncertainty-aware framework that improves clip modeling and text-clip matching to address challenges like semantic collapse, resulting in remarkable improvements over the previous state-of-the-art on three benchmarks.

Given a text query, partially relevant video retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments. Due to the lack of moment annotations, the uncertainty lying in clip modeling and text-clip correspondence leads to major challenges. Despite the great progress, existing solutions either sacrifice efficiency or efficacy to capture varying and uncertain video moments. What's worse, few methods have paid attention to the text-clip matching pattern under such uncertainty, exposing the risk of semantic collapse. To address these issues, we present GMMFormer v2, an uncertainty-aware framework for PRVR. For clip modeling, we improve a strong baseline GMMFormer with a novel temporal consolidation module upon multi-scale contextual features, which maintains efficiency and improves the perception for varying moments. To achieve uncertainty-aware text-clip matching, we upgrade the query diverse loss in GMMFormer to facilitate fine-grained uniformity and propose a novel optimal matching loss for fine-grained text-clip alignment. Their collaboration alleviates the semantic collapse phenomenon and neatly promotes accurate correspondence between texts and moments. We conduct extensive experiments and ablation studies on three PRVR benchmarks, demonstrating remarkable improvement of GMMFormer v2 compared to the past SOTA competitor and the versatility of uncertainty-aware text-clip matching for PRVR. Code is available at \url{https://github.com/huangmozhi9527/GMMFormer_v2}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes