Robust Online Multi-target Visual Tracking using a HISP Filter with Discriminative Deep Appearance Learning
This work addresses robust multi-target tracking for video surveillance applications, representing an incremental advancement by combining existing filter methods with deep learning.
The authors tackled the problem of online multi-target visual tracking in varying conditions by proposing a tracker based on the HISP filter with deep appearance learning, achieving significant improvements in tracking accuracy on MOT16 and MOT17 benchmarks.
We propose a novel online multi-target visual tracker based on the recently developed Hypothesized and Independent Stochastic Population (HISP) filter. The HISP filter combines advantages of traditional tracking approaches like MHT and point-process-based approaches like PHD filter, and it has linear complexity while maintaining track identities. We apply this filter for tracking multiple targets in video sequences acquired under varying environmental conditions and targets density using a tracking-by-detection approach. We also adopt deep CNN appearance representation by training a verification-identification network (VerIdNet) on large-scale person re-identification data sets. We construct an augmented likelihood in a principled manner using this deep CNN appearance features and spatio-temporal information. Furthermore, we solve the problem of two or more targets having identical label considering the weight propagated with each confirmed hypothesis. Extensive experiments on MOT16 and MOT17 benchmark data sets show that our tracker significantly outperforms several state-of-the-art trackers in terms of tracking accuracy.