CVMMJan 4, 2024

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

arXiv:2401.02309v288 citationsh-index: 7Has CodeAAAI
AI Analysis

This work addresses the problem of efficiently retrieving relevant moments and detecting highlights in videos based on natural language queries for applications in video analysis and summarization, representing an incremental improvement over existing DETR-based methods.

The paper tackles joint video moment retrieval and highlight detection by proposing TR-DETR, a task-reciprocal transformer that leverages the inherent reciprocity between these tasks, achieving state-of-the-art performance on datasets like QVHighlights, Charades-STA, and TVSum.

Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature extraction and feature interaction, achieving good performance. Nevertheless, these approaches underutilize the reciprocal relationship between two tasks. In this paper, we propose a task-reciprocal transformer based on DETR (TR-DETR) that focuses on exploring the inherent reciprocity between MR and HD. Specifically, a local-global multi-modal alignment module is first built to align features from diverse modalities into a shared latent space. Subsequently, a visual feature refinement is designed to eliminate query-irrelevant information from visual features for modal interaction. Finally, a task cooperation module is constructed to refine the retrieval pipeline and the highlight score prediction process by utilizing the reciprocity between MR and HD. Comprehensive experiments on QVHighlights, Charades-STA and TVSum datasets demonstrate that TR-DETR outperforms existing state-of-the-art methods. Codes are available at \url{https://github.com/mingyao1120/TR-DETR}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes