CVLGNov 1, 2024

Is Multiple Object Tracking a Matter of Specialization?

arXiv:2411.00553v15 citationsh-index: 66NIPS
Originality Incremental advance
AI Analysis

This work addresses domain generalization challenges in multiple object tracking, offering a parameter-efficient solution for heterogeneous scenarios, though it is incremental in its approach.

The paper tackles the problem of negative interference and limited domain generalization in transformer-based trackers by introducing PASTA, a framework that combines PEFT and MDL to train specialized modules for scenario attributes, achieving superior performance in zero-shot evaluations on MOT17 and PersonPath22 compared to monolithic models.

End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training these trackers in heterogeneous scenarios poses significant challenges, including negative interference - where the model learns conflicting scene-specific parameters - and limited domain generalization, which often necessitates expensive fine-tuning to adapt the models to new domains. In response to these challenges, we introduce Parameter-efficient Scenario-specific Tracking Architecture (PASTA), a novel framework that combines Parameter-Efficient Fine-Tuning (PEFT) and Modular Deep Learning (MDL). Specifically, we define key scenario attributes (e.g, camera-viewpoint, lighting condition) and train specialized PEFT modules for each attribute. These expert modules are combined in parameter space, enabling systematic generalization to new domains without increasing inference time. Extensive experiments on MOTSynth, along with zero-shot evaluations on MOT17 and PersonPath22 demonstrate that a neural tracker built from carefully selected modules surpasses its monolithic counterpart. We release models and code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes