UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
This work addresses multi-object tracking for computer vision applications, offering a universal training objective that is incremental as it integrates with existing systems without architectural changes.
The paper tackled the problem of multi-object tracking by introducing UniTrack, a plug-and-play graph-theoretic loss function that optimizes tracking-specific objectives, resulting in up to a 53% reduction in identity switches and 12% IDF1 improvements across benchmarks.
We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. Unlike prior graph-based MOT methods that redesign tracking architectures, UniTrack provides a universal training objective that integrates detection accuracy, identity preservation, and spatiotemporal consistency into a single end-to-end trainable loss function, enabling seamless integration with existing MOT systems without architectural modifications. Through differentiable graph representation learning, UniTrack enables networks to learn holistic representations of motion continuity and identity relationships across frames. We validate UniTrack across diverse tracking models and multiple challenging benchmarks, demonstrating consistent improvements across all tested architectures and datasets including Trackformer, MOTR, FairMOT, ByteTrack, GTR, and MOTE. Extensive evaluations show up to 53\% reduction in identity switches and 12\% IDF1 improvements across challenging benchmarks, with GTR achieving peak performance gains of 9.7\% MOTA on SportsMOT.