CVLGFeb 6, 2025

OneTrack-M: A multitask approach to transformer-based MOT models

arXiv:2502.04478v1
Originality Incremental advance
AI Analysis

This work addresses efficiency challenges in MOT for real-time applications like autonomous vehicles and surveillance, though it is incremental as it builds on existing transformer-based methods.

The paper tackled the problem of improving computational efficiency and accuracy in Multi-Object Tracking (MOT) by introducing OneTrack-M, a transformer-based model that simplifies architecture to eliminate the decoder, resulting in at least 25% faster inference times while maintaining or improving tracking accuracy.

Multi-Object Tracking (MOT) is a critical problem in computer vision, essential for understanding how objects move and interact in videos. This field faces significant challenges such as occlusions and complex environmental dynamics, impacting model accuracy and efficiency. While traditional approaches have relied on Convolutional Neural Networks (CNNs), introducing transformers has brought substantial advancements. This work introduces OneTrack-M, a transformer-based MOT model designed to enhance tracking computational efficiency and accuracy. Our approach simplifies the typical transformer-based architecture by eliminating the need for a decoder model for object detection and tracking. Instead, the encoder alone serves as the backbone for temporal data interpretation, significantly reducing processing time and increasing inference speed. Additionally, we employ innovative data pre-processing and multitask training techniques to address occlusion and diverse objective challenges within a single set of weights. Experimental results demonstrate that OneTrack-M achieves at least 25% faster inference times compared to state-of-the-art models in the literature while maintaining or improving tracking accuracy metrics. These improvements highlight the potential of the proposed solution for real-time applications such as autonomous vehicles, surveillance systems, and robotics, where rapid responses are crucial for system effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes