CVNov 28, 2024

Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

Finlay G. C. Hudson, William A. P. Smith

arXiv:2411.19210v12.01 citationsh-index: 1

Originality Highly original

AI Analysis

This addresses the challenge of segmenting occluded objects in videos for computer vision applications, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of amodal video object segmentation without requiring pretrained class labels, achieving zero-shot inference using only a single query mask from the first visible frame. The results include the creation of the TABE-51 dataset with highly accurate ground truth masks and a specialized evaluation framework that isolates amodal completion performance.

We present Track Anything Behind Everything (TABE), a novel dataset, pipeline, and evaluation framework for zero-shot amodal completion from visible masks. Unlike existing methods that require pretrained class labels, our approach uses a single query mask from the first frame where the object is visible, enabling flexible, zero-shot inference. Our dataset, TABE-51 provides highly accurate ground truth amodal segmentation masks without the need for human estimation or 3D reconstruction. Our TABE pipeline is specifically designed to handle amodal completion, even in scenarios where objects are completely occluded. We also introduce a specialised evaluation framework that isolates amodal completion performance, free from the influence of traditional visual segmentation metrics.

View on arXiv PDF

Similar