Learning to Divide and Conquer for Online Multi-Target Tracking
This work addresses tracking challenges in static camera scenes for applications like surveillance, but it is incremental as it builds on existing tracking-by-detection paradigms.
The paper tackled the problem of ambiguities in online multi-target tracking caused by occlusions and detection errors by proposing a tracker that selectively uses features and partitions the assignment problem into local subproblems, resulting in a significant improvement of tracking performance (MOTA +10%) over state-of-the-art methods.
Online Multiple Target Tracking (MTT) is often addressed within the tracking-by-detection paradigm. Detections are previously extracted independently in each frame and then objects trajectories are built by maximizing specifically designed coherence functions. Nevertheless, ambiguities arise in presence of occlusions or detection errors. In this paper we claim that the ambiguities in tracking could be solved by a selective use of the features, by working with more reliable features if possible and exploiting a deeper representation of the target only if necessary. To this end, we propose an online divide and conquer tracker for static camera scenes, which partitions the assignment problem in local subproblems and solves them by selectively choosing and combining the best features. The complete framework is cast as a structural learning task that unifies these phases and learns tracker parameters from examples. Experiments on two different datasets highlights a significant improvement of tracking performances (MOTA +10%) over the state of the art.