CVJun 1, 2025

Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking

arXiv:2506.00774v12 citationsh-index: 3Has CodeICIP
Originality Highly original
AI Analysis

This work addresses limitations in MOT for scenarios with occlusions or visually similar objects, offering a novel approach that is incremental but effective.

The paper tackles the problem of multiple object tracking (MOT) by introducing a depth-aware framework that uses estimated depth and a hierarchical alignment score to improve association accuracy, achieving state-of-the-art results on challenging benchmarks without training or fine-tuning.

Current motion-based multiple object tracking (MOT) approaches rely heavily on Intersection-over-Union (IoU) for object association. Without using 3D features, they are ineffective in scenarios with occlusions or visually similar objects. To address this, our paper presents a novel depth-aware framework for MOT. We estimate depth using a zero-shot approach and incorporate it as an independent feature in the association process. Additionally, we introduce a Hierarchical Alignment Score that refines IoU by integrating both coarse bounding box overlap and fine-grained (pixel-level) alignment to improve association accuracy without requiring additional learnable parameters. To our knowledge, this is the first MOT framework to incorporate 3D features (monocular depth) as an independent decision matrix in the association step. Our framework achieves state-of-the-art results on challenging benchmarks without any training nor fine-tuning. The code is available at https://github.com/Milad-Khanchi/DepthMOT

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes