CVMay 31, 2022

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

arXiv:2205.15495v16 citationsh-index: 77Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving tracking accuracy in computer vision for applications like surveillance and autonomous driving, representing an incremental advance by integrating spatial-temporal and appearance features within a single, efficient model.

The paper tackles the problem of multiple object tracking by proposing TransSTAM, a Transformer-based method that jointly models appearance and spatial-temporal relationships, achieving clear performance improvements in IDF1 and HOTA metrics on MOT16, MOT17, and MOT20 benchmarks compared to previous state-of-the-art approaches.

The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model both the appearance features of each object and the spatial-temporal relationships among objects. TransSTAM consists of two major parts: (1) The encoder utilizes the powerful self-attention mechanism of Transformer to learn discriminative features for each tracklet; (2) The decoder adopts the standard cross-attention mechanism to model the affinities between the tracklets and the detections by taking both spatial-temporal and appearance features into account. TransSTAM has two major advantages: (1) It is solely based on the encoder-decoder architecture and enjoys a compact network design, hence being computationally efficient; (2) It can effectively learn spatial-temporal and appearance features within one model, hence achieving better tracking accuracy. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA with respect to previous state-of-the-art approaches on all the benchmarks. Our code is available at \url{https://github.com/icicle4/TranSTAM}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes