CVAISep 3, 2023

UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance

arXiv:2309.01078v1
Originality Incremental advance
AI Analysis

This addresses the expensive annotation issue in multi-object tracking for computer vision applications, representing an incremental advance over existing unsupervised methods.

The paper tackles the problem of unsupervised multi-object tracking by proposing UnsMOT, a framework that combines appearance, motion, and geometric features, achieving state-of-the-art performance with improvements in HOTA, IDF1, and MOTA metrics.

Object detection has long been a topic of high interest in computer vision literature. Motivated by the fact that annotating data for the multi-object tracking (MOT) problem is immensely expensive, recent studies have turned their attention to the unsupervised learning setting. In this paper, we push forward the state-of-the-art performance of unsupervised MOT methods by proposing UnsMOT, a novel framework that explicitly combines the appearance and motion features of objects with geometric information to provide more accurate tracking. Specifically, we first extract the appearance and motion features using CNN and RNN models, respectively. Then, we construct a graph of objects based on their relative distances in a frame, which is fed into a GNN model together with CNN features to output geometric embedding of objects optimized using an unsupervised loss function. Finally, associations between objects are found by matching not only similar extracted features but also geometric embedding of detections and tracklets. Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes