CVJul 29, 2019

End-to-End Learning Deep CRF models for Multi-Object Tracking

arXiv:1907.12176v150 citations
Originality Incremental advance
AI Analysis

This work addresses identity switches and occlusion challenges in multi-object tracking for computer vision applications, representing an incremental improvement over existing deep learning methods.

The paper tackles the problem of multi-object tracking under occlusion and in crowded scenes by proposing an end-to-end deep CRF model that jointly optimizes assignment costs and long-term dependencies, achieving state-of-the-art performance on MOT-2015 and MOT-2016 benchmarks.

Existing deep multi-object tracking (MOT) approaches first learn a deep representation to describe target objects and then associate detection results by optimizing a linear assignment problem. Despite demonstrated successes, it is challenging to discriminate target objects under mutual occlusion or to reduce identity switches in crowded scenes. In this paper, we propose learning deep conditional random field (CRF) networks, aiming to model the assignment costs as unary potentials and the long-term dependencies among detection results as pairwise potentials. Specifically, we use a bidirectional long short-term memory (LSTM) network to encode the long-term dependencies. We pose the CRF inference as a recurrent neural network learning process using the standard gradient descent algorithm, where unary and pairwise potentials are jointly optimized in an end-to-end manner. Extensive experimental results on the challenging MOT datasets including MOT-2015 and MOT-2016, demonstrate that our approach achieves the state of the art performances in comparison with published works on both benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes