Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification
This work addresses the problem of real-time multiple people tracking for video analysis applications, representing an incremental improvement with novel method integration.
The paper tackles the challenge of unreliable detection in online multi-object tracking by generating redundant candidates from both detection and tracking outputs, and introduces a fully convolutional neural network scoring function for real-time candidate selection. It achieves state-of-the-art performance on a widely used people tracking benchmark while operating in real-time.
Online multi-object tracking is a fundamental problem in time-critical video analysis applications. A major challenge in the popular tracking-by-detection framework is how to associate unreliable detection results with existing tracks. In this paper, we propose to handle unreliable detection by collecting candidates from outputs of both detection and tracking. The intuition behind generating redundant candidates is that detection and tracks can complement each other in different scenarios. Detection results of high confidence prevent tracking drifts in the long term, and predictions of tracks can handle noisy detection caused by occlusion. In order to apply optimal selection from a considerable amount of candidates in real-time, we present a novel scoring function based on a fully convolutional neural network, that shares most computations on the entire image. Moreover, we adopt a deeply learned appearance representation, which is trained on large-scale person re-identification datasets, to improve the identification ability of our tracker. Extensive experiments show that our tracker achieves real-time and state-of-the-art performance on a widely used people tracking benchmark.