TAO-Amodal: A Benchmark for Tracking Any Object Amodally
This addresses a critical gap for applications like autonomous driving by providing a benchmark for amodal object tracking, though it is incremental as it builds on existing datasets and methods.
The paper tackles the lack of amodal perception benchmarks in computer vision by introducing TAO-Amodal, a dataset with 833 categories and amodal annotations for occluded objects, and finds that existing methods struggle with heavy occlusion, with simple finetuning improving amodal tracking and detection metrics by 2.1% and 3.3%.
Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of \textit{modal} annotations in most benchmarks. To address the scarcity of amodal benchmarks, we introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences. Our dataset includes \textit{amodal} and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame. We investigate the current lay of the land in both amodal tracking and detection by benchmarking state-of-the-art modal trackers and amodal segmentation methods. We find that existing methods, even when adapted for amodal tracking, struggle to detect and track objects under heavy occlusion. To mitigate this, we explore simple finetuning schemes that can increase the amodal tracking and detection metrics of occluded objects by 2.1\% and 3.3\%.