CVAILGROMay 4, 2023

Tracking through Containers and Occluders in the Wild

arXiv:2305.03052v118 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of persistent object tracking for computer vision systems in dynamic, real-world scenarios, representing an incremental advancement in benchmarking and model evaluation.

The paper tackles the problem of tracking objects through heavy occlusion and containment in cluttered environments by introducing TCOW, a new benchmark and model for segmenting both target objects and their occluders or containers in videos, finding that while transformer-based models show some capability, there is still a significant performance gap in achieving true object permanence.

Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce $\textbf{TCOW}$, a new benchmark and model for visual tracking through heavy occlusion and containment. We set up a task where the goal is to, given a video sequence, segment both the projected extent of the target object, as well as the surrounding container or occluder whenever one exists. To study this task, we create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance under various forms of task variation, such as moving or nested containment. We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes