CVSep 27, 2024

Query matching for spatio-temporal action detection with query-based object detector

arXiv:2409.18408v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses temporal inconsistency in video action detection for computer vision applications, representing an incremental advancement.

The paper tackles the problem of spatio-temporal action detection in videos by extending DETR with query matching to maintain temporal consistency, resulting in significant performance improvements on the JHMDB21 dataset.

In this paper, we propose a method that extends the query-based object detection model, DETR, to spatio-temporal action detection, which requires maintaining temporal consistency in videos. Our proposed method applies DETR to each frame and uses feature shift to incorporate temporal information. However, DETR's object queries in each frame may correspond to different objects, making a simple feature shift ineffective. To overcome this issue, we propose query matching across different frames, ensuring that queries for the same object are matched and used for the feature shift. Experimental results show that performance on the JHMDB21 dataset improves significantly when query features are shifted using the proposed query matching.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes