When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking
This work addresses efficiency and accuracy issues in multiple object tracking for applications like video surveillance, though it is incremental as it builds on existing SOTA methods.
The paper tackles the overhead of extracting ReID features in multiple object tracking by proposing a selective approach that minimizes runtime while preserving accuracy, showing significant reductions in runtime and improved accuracy on datasets like MOT17, MOT20, and DanceTrack.
Extracting and matching Re-Identification (ReID) features is used by many state-of-the-art (SOTA) Multiple Object Tracking (MOT) methods, particularly effective against frequent and long-term occlusions. While end-to-end object detection and tracking have been the main focus of recent research, they have yet to outperform traditional methods in benchmarks like MOT17 and MOT20. Thus, from an application standpoint, methods with separate detection and embedding remain the best option for accuracy, modularity, and ease of implementation, though they are impractical for edge devices due to the overhead involved. In this paper, we investigate a selective approach to minimize the overhead of feature extraction while preserving accuracy, modularity, and ease of implementation. This approach can be integrated into various SOTA methods. We demonstrate its effectiveness by applying it to StrongSORT and Deep OC-SORT. Experiments on MOT17, MOT20, and DanceTrack datasets show that our mechanism retains the advantages of feature extraction during occlusions while significantly reducing runtime. Additionally, it improves accuracy by preventing confusion in the feature-matching stage, particularly in cases of deformation and appearance similarity, which are common in DanceTrack. https://github.com/emirhanbayar/Fast-StrongSORT, https://github.com/emirhanbayar/Fast-Deep-OC-SORT