Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies
This addresses the challenge of tracking occluded or similar-appearance targets in multi-target tracking, which is an incremental improvement over existing methods.
The paper tackles the problem of Multi-Target Tracking (MTT) by developing an online method that encodes long-term temporal dependencies across multiple cues to accurately track occluded or similar-appearance targets, resulting in outperforming previous works on multiple datasets including the MOT benchmark.
The majority of existing solutions to the Multi-Target Tracking (MTT) problem do not combine cues in a coherent end-to-end fashion over a long period of time. However, we present an online method that encodes long-term temporal dependencies across multiple cues. One key challenge of tracking methods is to accurately track occluded targets or those which share similar appearance properties with surrounding objects. To address this challenge, we present a structure of Recurrent Neural Networks (RNN) that jointly reasons on multiple cues over a temporal window. We are able to correct many data association errors and recover observations from an occluded state. We demonstrate the robustness of our data-driven approach by tracking multiple targets using their appearance, motion, and even interactions. Our method outperforms previous works on multiple publicly available datasets including the challenging MOT benchmark.