CVAug 6, 2022Code
Transformer-based assignment decision network for multiple object trackingAthena Psalta, Vasileios Tsironis, Konstantinos Karantzalos
Data association is a crucial component for any multiple object tracking (MOT) method that follows the tracking-by-detection paradigm. To generate complete trajectories such methods employ a data association process to establish assignments between detections and existing targets during each timestep. Recent data association approaches try to solve either a multi-dimensional linear assignment task or a network flow minimization problem or tackle it via multiple hypotheses tracking. However, during inference an optimization step that computes optimal assignments is required for every sequence frame inducing additional complexity to any given solution. To this end, in the context of this work we introduce Transformer-based Assignment Decision Network (TADN) that tackles data association without the need of any explicit optimization during inference. In particular, TADN can directly infer assignment pairs between detections and active targets in a single forward pass of the network. We have integrated TADN in a rather simple MOT framework, designed a novel training strategy for efficient end-to-end training and demonstrated the high potential of our approach for online visual tracking-by-detection MOT on several popular benchmarks, i.e. MOT17, MOT20 and UA-DETRAC. Our proposed approach demonstrates strong performance in most evaluation metrics despite its simple nature as a tracker lacking significant auxiliary components such as occlusion handling or re-identification. The implementation of our method is publicly available at https://github.com/psaltaath/tadn-mot.
CVDec 9, 2025Code
What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute ImportanceAthena Psalta, Vasileios Tsironis, Konstantinos Karantzalos
State-of-the-art person re-identification methods achieve impressive accuracy but remain largely opaque, leaving open the question: which high-level semantic attributes do these models actually rely on? We propose MoSAIC-ReID, a Mixture-of-Experts framework that systematically quantifies the importance of pedestrian attributes for re-identification. Our approach uses LoRA-based experts, each linked to a single attribute, and an oracle router that enables controlled attribution analysis. While MoSAIC-ReID achieves competitive performance on Market-1501 and DukeMTMC under the assumption that attribute annotations are available at test time, its primary value lies in providing a large-scale, quantitative study of attribute importance across intrinsic and extrinsic cues. Using generalized linear models, statistical tests, and feature-importance analyses, we reveal which attributes, such as clothing colors and intrinsic characteristics, contribute most strongly, while infrequent cues (e.g. accessories) have limited effect. This work offers a principled framework for interpretable ReID and highlights the requirements for integrating explicit semantic knowledge in practice. Code is available at https://github.com/psaltaath/MoSAIC-ReID
CVJul 7, 2024
Addressing single object tracking in satellite imagery through prompt-engineered solutionsAthena Psalta, Vasileios Tsironis, Andreas El Saer et al.
Object tracking in satellite videos remains a complex endeavor in remote sensing due to the intricate and dynamic nature of satellite imagery. Existing state-of-the-art trackers in computer vision integrate sophisticated architectures, attention mechanisms, and multi-modal fusion to enhance tracking accuracy across diverse environments. However, the challenges posed by satellite imagery, such as background variations, atmospheric disturbances, and low-resolution object delineation, significantly impede the precision and reliability of traditional Single Object Tracking (SOT) techniques. Our study delves into these challenges and proposes prompt engineering methodologies, leveraging the Segment Anything Model (SAM) and TAPIR (Tracking Any Point with per-frame Initialization and temporal Refinement), to create a training-free point-based tracking method for small-scale objects on satellite videos. Experiments on the VISO dataset validate our strategy, marking a significant advancement in robust tracking solutions tailored for satellite imagery in remote sensing applications.