CVSep 11, 2023

Mobile Vision Transformer-based Visual Object Tracking

arXiv:2309.05829v116 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate visual object tracking on resource-constrained devices, representing an incremental improvement in lightweight tracking methods.

The paper tackles the problem of computationally expensive object tracking by proposing a lightweight tracker using Mobile Vision Transformers, achieving higher accuracy than recent lightweight trackers on large-scale datasets and outperforming DiMP-50 with 4.7 times fewer parameters and 2.8 times faster speed.

The introduction of robust backbones, such as Vision Transformers, has improved the performance of object tracking algorithms in recent years. However, these state-of-the-art trackers are computationally expensive since they have a large number of model parameters and rely on specialized hardware (e.g., GPU) for faster inference. On the other hand, recent lightweight trackers are fast but are less accurate, especially on large-scale datasets. We propose a lightweight, accurate, and fast tracking algorithm using Mobile Vision Transformers (MobileViT) as the backbone for the first time. We also present a novel approach of fusing the template and search region representations in the MobileViT backbone, thereby generating superior feature encoding for target localization. The experimental results show that our MobileViT-based Tracker, MVT, surpasses the performance of recent lightweight trackers on the large-scale datasets GOT10k and TrackingNet, and with a high inference speed. In addition, our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU. The tracker code and models are available at https://github.com/goutamyg/MVT

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes