CVMar 21, 2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

arXiv:2303.12079v239 citationsh-index: 61
Originality Incremental advance
AI Analysis

This work addresses the issue of redundant training and parameters in visual object tracking for researchers and practitioners, though it is incremental as it builds on existing best practices.

The paper tackles the problem of divergent solutions for instance and category tracking tasks by proposing a unified tracking-with-detection paradigm, resulting in OmniTracker achieving on-par or better results across seven datasets while reducing model redundancy.

Visual Object Tracking (VOT) aims to estimate the positions of target objects in a video sequence, which is an important vision task with various real-world applications. Depending on whether the initial states of target objects are specified by provided annotations in the first frame or the categories, VOT could be classified as instance tracking (e.g., SOT and VOS) and category tracking (e.g., MOT, MOTS, and VIS) tasks. Different definitions have led to divergent solutions for these two types of tasks, resulting in redundant training expenses and parameter overhead. In this paper, combing the advantages of the best practices developed in both communities, we propose a novel tracking-with-detection paradigm, where tracking supplements appearance priors for detection and detection provides tracking with candidate bounding boxes for the association. Equipped with such a design, a unified tracking model, OmniTracker, is further presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline, eliminating the need for task-specific architectures and reducing redundancy in model parameters. We conduct extensive experimentation on seven prominent tracking datasets of different tracking tasks, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, and demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes