Learning Multi-target Tracking with Quadratic Object Interactions
This work addresses multi-target tracking in video for applications like autonomous driving, representing an incremental improvement with a faster algorithm.
The paper tackles multi-target tracking by modeling pairwise interactions between tracks with a quadratic objective, learning parameters via structured prediction. The proposed greedy algorithm matches the performance of an LP relaxation while being 2-7x faster, and the model outperforms existing methods on the KITTI benchmark.
We describe a model for multi-target tracking based on associating collections of candidate detections across frames of a video. In order to model pairwise interactions between different tracks, such as suppression of overlapping tracks and contextual cues about co-occurence of different objects, we augment a standard min-cost flow objective with quadratic terms between detection variables. We learn the parameters of this model using structured prediction and a loss function which approximates the multi-target tracking accuracy. We evaluate two different approaches to finding an optimal set of tracks under model objective based on an LP relaxation and a novel greedy extension to dynamic programming that handles pairwise interactions. We find the greedy algorithm achieves equivalent performance to the LP relaxation while being 2-7x faster than a commercial solver. The resulting model with learned parameters outperforms existing methods across several categories on the KITTI tracking benchmark.