CVMay 6, 2019

Frame-wise Motion and Appearance for Real-time Multiple Object Tracking

Jimuyang Zhang, Sanping Zhou, Jinjun Wang, Dong Huang

arXiv:1905.02292v14.79 citations

Originality Incremental advance

AI Analysis

This addresses the efficiency problem in multiple object tracking for real-time applications, offering an incremental improvement over existing methods.

The paper tackled the challenge of real-time multiple object tracking by proposing a deep neural network that simultaneously models associations among indefinite objects without increasing computational cost with object count, achieving competitive results on the MOT17 benchmark.

The main challenge of Multiple Object Tracking (MOT) is the efficiency in associating indefinite number of objects between video frames. Standard motion estimators used in tracking, e.g., Long Short Term Memory (LSTM), only deal with single object, while Re-IDentification (Re-ID) based approaches exhaustively compare object appearances. Both approaches are computationally costly when they are scaled to a large number of objects, making it very difficult for real-time MOT. To address these problems, we propose a highly efficient Deep Neural Network (DNN) that simultaneously models association among indefinite number of objects. The inference computation of the DNN does not increase with the number of objects. Our approach, Frame-wise Motion and Appearance (FMA), computes the Frame-wise Motion Fields (FMF) between two frames, which leads to very fast and reliable matching among a large number of object bounding boxes. As auxiliary information is used to fix uncertain matches, Frame-wise Appearance Features (FAF) are learned in parallel with FMFs. Extensive experiments on the MOT17 benchmark show that our method achieved real-time MOT with competitive results as the state-of-the-art approaches.

View on arXiv PDF

Similar