SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth
This work addresses the challenging issue of tracking multiple objects in congested environments for computer vision applications, representing an incremental improvement over existing methods.
The paper tackles the problem of multi-object tracking in crowded scenes with frequent occlusions by proposing a pseudo-depth estimation method and a depth cascading matching algorithm to decompose dense scenes into sparse subsets, achieving comparable performance to state-of-the-art methods on MOT17 and MOT20 benchmarks using only IoU matching.
Exploring robust and efficient association methods has always been an important issue in multiple-object tracking (MOT). Although existing tracking methods have achieved impressive performance, congestion and frequent occlusions still pose challenging problems in multi-object tracking. We reveal that performing sparse decomposition on dense scenes is a crucial step to enhance the performance of associating occluded targets. To this end, we propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images. Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets and perform data association on these sparse target subsets in order from near to far. By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack. SparseTrack provides a new perspective for solving the challenging crowded scene MOT problem. Only using IoU matching, SparseTrack achieves comparable performance with the state-of-the-art (SOTA) methods on the MOT17 and MOT20 benchmarks. Code and models are publicly available at \url{https://github.com/hustvl/SparseTrack}.