CVSep 4, 2024

One Homography is All You Need: IMM-based Joint Homography and Multiple Object State Estimation

Paul Johannes Claasen, Johan Pieter de Villiers

arXiv:2409.02562v43.71 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses multi-object tracking in computer vision, offering a more efficient approach by reducing reliance on 3D data, though it appears incremental as it builds on existing IMM filter methods.

The paper tackles the problem of multi-object tracking (MOT) by proposing IMM-JHSE, a method that uses only an initial homography estimate instead of regular 3D measurements, jointly modeling homography dynamics to reduce camera motion compensation effects. It achieves improvements over related techniques, increasing HOTA by 2.64 on DanceTrack and 2.11 on KITTI-car datasets.

A novel online MOT algorithm, IMM Joint Homography State Estimation (IMM-JHSE), is proposed. IMM-JHSE uses an initial homography estimate as the only additional 3D information, whereas other 3D MOT methods use regular 3D measurements. By jointly modelling the homography matrix and its dynamics as part of track state vectors, IMM-JHSE removes the explicit influence of camera motion compensation techniques on predicted track position states, which was prevalent in previous approaches. Expanding upon this, static and dynamic camera motion models are combined using an IMM filter. A simple bounding box motion model is used to predict bounding box positions to incorporate image plane information. In addition to applying an IMM to camera motion, a non-standard IMM approach is applied where bounding-box-based BIoU scores are mixed with ground-plane-based Mahalanobis distances in an IMM-like fashion to perform association only, making IMM-JHSE robust to motion away from the ground plane. Finally, IMM-JHSE makes use of dynamic process and measurement noise estimation techniques. IMM-JHSE improves upon related techniques, including UCMCTrack, OC-SORT, C-BIoU and ByteTrack on the DanceTrack and KITTI-car datasets, increasing HOTA by 2.64 and 2.11, respectively, while offering competitive performance on the MOT17, MOT20 and KITTI-pedestrian datasets. Using publicly available detections, IMM-JHSE outperforms almost all other 2D MOT methods and is outperformed only by 3D MOT methods -- some of which are offline -- on the KITTI-car dataset. Compared to tracking-by-attention methods, IMM-JHSE shows remarkably similar performance on the DanceTrack dataset and outperforms them on the MOT17 dataset. The code is publicly available: https://github.com/Paulkie99/imm-jhse.

View on arXiv PDF Code

Similar