CVDec 28, 2023

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei

arXiv:2312.17133v324.1133 citationsh-index: 64Has Code

Originality Incremental advance

AI Analysis

This work addresses video object tracking for computer vision applications, presenting an incremental improvement over prior methods.

The paper tackles the problem of video object tracking by integrating localization and appearance analysis into a unified generative framework, achieving state-of-the-art performance with an AO score of 79.5% on GOT-10k and AUC of 86.1% on TrackingNet while being 3.6 times faster than its predecessor.

We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being $3.6 \times$ faster than ARTrack. The code will be released.

View on arXiv PDF Code

Similar