Robust Multi-Object Tracking by Marginal Inference
This work addresses a domain-specific problem in video analysis by providing a more stable and interpretable approach for multi-object tracking, though it is incremental as it builds on existing trackers.
The paper tackles the challenge of selecting a single optimal threshold for discarding impossible object pairs in multi-object tracking by introducing an efficient method to compute marginal probabilities as normalized distances, resulting in about a one-point improvement in IDF1 metric on MOT17 and MOT20 benchmarks.
Multi-object tracking in videos requires to solve a fundamental problem of one-to-one assignment between objects in adjacent frames. Most methods address the problem by first discarding impossible pairs whose feature distances are larger than a threshold, followed by linking objects using Hungarian algorithm to minimize the overall distance. However, we find that the distribution of the distances computed from Re-ID features may vary significantly for different videos. So there isn't a single optimal threshold which allows us to safely discard impossible pairs. To address the problem, we present an efficient approach to compute a marginal probability for each pair of objects in real time. The marginal probability can be regarded as a normalized distance which is significantly more stable than the original feature distance. As a result, we can use a single threshold for all videos. The approach is general and can be applied to the existing trackers to obtain about one point improvement in terms of IDF1 metric. It achieves competitive results on MOT17 and MOT20 benchmarks. In addition, the computed probability is more interpretable which facilitates subsequent post-processing operations.