CVJan 1, 2022

PatchTrack: Multiple Object Tracking Using Frame Patches

arXiv:2201.00080v116 citations
Originality Incremental advance
AI Analysis

This work addresses tracking accuracy in computer vision, but it is incremental as it builds on existing joint-detection-and-tracking methods with a patch-based approach.

The paper tackles multiple object tracking by proposing PatchTrack, a Transformer-based system that uses patches from predicted bounding boxes to integrate object motion and appearance, achieving MOTA scores of 73.71% on MOT16 and 73.59% on MOT17.

Object motion and object appearance are commonly used information in multiple object tracking (MOT) applications, either for associating detections across frames in tracking-by-detection methods or direct track predictions for joint-detection-and-tracking methods. However, not only are these two types of information often considered separately, but also they do not help optimize the usage of visual information from the current frame of interest directly. In this paper, we present PatchTrack, a Transformer-based joint-detection-and-tracking system that predicts tracks using patches of the current frame of interest. We use the Kalman filter to predict the locations of existing tracks in the current frame from the previous frame. Patches cropped from the predicted bounding boxes are sent to the Transformer decoder to infer new tracks. By utilizing both object motion and object appearance information encoded in patches, the proposed method pays more attention to where new tracks are more likely to occur. We show the effectiveness of PatchTrack on recent MOT benchmarks, including MOT16 (MOTA 73.71%, IDF1 65.77%) and MOT17 (MOTA 73.59%, IDF1 65.23%). The results are published on https://motchallenge.net/method/MOT=4725&chl=10.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes