CVAIDec 16, 2021

HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

arXiv:2112.09131v229 citationsHas Code
Originality Highly original
AI Analysis

This addresses the problem of costly dense video annotations for VOS by leveraging static images, offering a more efficient solution for video analysis tasks.

The paper tackles Video Object Segmentation (VOS) by proposing HODOR, a method that uses high-level descriptors from static images to re-segment objects in videos, achieving state-of-the-art performance on DAVIS and YouTube-VOS benchmarks without video annotations.

Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video. This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles VOS by effectively leveraging annotated static images for understanding object appearance and scene context. We encode object instances and scene information from an image frame into robust high-level descriptors which can then be used to re-segment those objects in different frames. As a result, HODOR achieves state-of-the-art performance on the DAVIS and YouTube-VOS benchmarks compared to existing methods trained without video annotations. Without any architectural modification, HODOR can also learn from video context around single annotated video frames by utilizing cyclic consistency, whereas other methods rely on dense, temporally consistent annotations. Source code is available at: https://github.com/Ali2500/HODOR

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes