CVApr 8, 2024

DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker

arXiv:2404.05518v16.59 citationsh-index: 15Has Code

Originality Incremental advance

AI Analysis

This addresses tracking accuracy issues in drone-based surveillance, though it is incremental by integrating depth and pose estimation into existing frameworks.

The paper tackles the challenges of multi-object tracking in crowded scenes and videos with irregular camera motion by using depth cues and camera pose estimation, achieving superior performance on VisDrone-MOT and UAVDT datasets.

Accurately distinguishing each object is a fundamental goal of Multi-object tracking (MOT) algorithms. However, achieving this goal still remains challenging, primarily due to: (i) For crowded scenes with occluded objects, the high overlap of object bounding boxes leads to confusion among closely located objects. Nevertheless, humans naturally perceive the depth of elements in a scene when observing 2D videos. Inspired by this, even though the bounding boxes of objects are close on the camera plane, we can differentiate them in the depth dimension, thereby establishing a 3D perception of the objects. (ii) For videos with rapidly irregular camera motion, abrupt changes in object positions can result in ID switches. However, if the camera pose are known, we can compensate for the errors in linear motion models. In this paper, we propose \textit{DepthMOT}, which achieves: (i) detecting and estimating scene depth map \textit{end-to-end}, (ii) compensating the irregular camera motion by camera pose estimation. Extensive experiments demonstrate the superior performance of DepthMOT in VisDrone-MOT and UAVDT datasets. The code will be available at \url{https://github.com/JackWoo0831/DepthMOT}.

View on arXiv PDF Code

Similar