Zero-Shot Multi-Animal Tracking in the Wild
This addresses the challenge of tracking multiple animals in varied habitats for ecological and behavioral studies, but it is incremental as it combines existing models with heuristics.
The paper tackled the problem of multi-animal tracking in the wild by developing a zero-shot framework using vision foundation models, achieving strong and consistent performance across diverse species and environments without retraining.
Multi-animal tracking is crucial for understanding animal ecology and behavior. However, it remains a challenging task due to variations in habitat, motion patterns, and species appearance. Traditional approaches typically require extensive model fine-tuning and heuristic design for each application scenario. In this work, we explore the potential of recent vision foundation models for zero-shot multi-animal tracking. By combining a Grounding Dino object detector with the Segment Anything Model 2 (SAM 2) tracker and carefully designed heuristics, we develop a tracking framework that can be applied to new datasets without any retraining or hyperparameter adaptation. Evaluations on ChimpAct, Bird Flock Tracking, AnimalTrack, and a subset of GMOT-40 demonstrate strong and consistent performance across diverse species and environments. The code is available at https://github.com/ecker-lab/SAM2-Animal-Tracking.