CVAIIRMMOct 8, 2023

GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval

arXiv:2310.05195v233 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency and semantic sparsity issues in video retrieval for applications like multimedia search, though it is incremental as it builds on existing PRVR methods.

The paper tackles the efficiency problem in partially relevant video retrieval (PRVR) by proposing GMMFormer, which uses Gaussian-Mixture-Model constraints in a Transformer to implicitly model clips, reducing storage overhead and achieving state-of-the-art results on three large-scale datasets.

Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos containing pertinent moments in a database. For PRVR, clip modeling is essential to capture the partial relationship between texts and videos. Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead. To solve the efficiency problem of PRVR methods, this paper proposes GMMFormer, a Gaussian-Mixture-Model based Transformer which models clip representations implicitly. During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. Then generated representations will contain multi-scale clip information, achieving implicit clip modeling. In addition, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intensive and contain more semantic information. Extensive experiments on three large-scale video datasets (i.e., TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer. Code is available at \url{https://github.com/huangmozhi9527/GMMFormer}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes