CVApr 4, 2025

TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

arXiv:2504.03258v16 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in multi-object tracking for autonomous driving applications, but it is incremental as it builds on existing query denoising and tracking paradigms.

The paper tackled the problem of slow convergence and limited temporal learning in query denoising for 3D multi-object tracking by proposing TQD-Track, which introduces temporal query denoising to carry temporal information and instance-specific features, resulting in consistent performance improvements on the nuScenes dataset.

Query denoising has become a standard training strategy for DETR-based detectors by addressing the slow convergence issue. Besides that, query denoising can be used to increase the diversity of training samples for modeling complex scenarios which is critical for Multi-Object Tracking (MOT), showing its potential in MOT application. Existing approaches integrate query denoising within the tracking-by-attention paradigm. However, as the denoising process only happens within the single frame, it cannot benefit the tracker to learn temporal-related information. In addition, the attention mask in query denoising prevents information exchange between denoising and object queries, limiting its potential in improving association using self-attention. To address these issues, we propose TQD-Track, which introduces Temporal Query Denoising (TQD) tailored for MOT, enabling denoising queries to carry temporal information and instance-specific feature representation. We introduce diverse noise types onto denoising queries that simulate real-world challenges in MOT. We analyze our proposed TQD for different tracking paradigms, and find out the paradigm with explicit learned data association module, e.g. tracking-by-detection or alternating detection and association, benefit from TQD by a larger margin. For these paradigms, we further design an association mask in the association module to ensure the consistent interaction between track and detection queries as during inference. Extensive experiments on the nuScenes dataset demonstrate that our approach consistently enhances different tracking methods by only changing the training process, especially the paradigms with explicit association module.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes