CVAug 11, 2016

Clockwork Convnets for Video Semantic Segmentation

arXiv:1608.03609v1213 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient real-time video segmentation for applications like autonomous driving or surveillance, though it is incremental as it builds on existing convnet architectures.

The paper tackled the problem of high computational cost in video semantic segmentation by proposing clockwork convnets that schedule layer updates based on semantic stability, achieving reduced computation and latency while maintaining accuracy on datasets like Youtube-Objects, NYUD, and Cityscapes.

Recent years have seen tremendous progress in still-image segmentation; however the naïve application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video. We propose a video recognition framework that relies on two key observations: 1) while pixels may change rapidly from frame to frame, the semantic content of a scene evolves more slowly, and 2) execution can be viewed as an aspect of architecture, yielding purpose-fit computation schedules for networks. We define a novel family of "clockwork" convnets driven by fixed or adaptive clock signals that schedule the processing of different layers at different update rates according to their semantic stability. We design a pipeline schedule to reduce latency for real-time recognition and a fixed-rate schedule to reduce overall computation. Finally, we extend clockwork scheduling to adaptive video processing by incorporating data-driven clocks that can be tuned on unlabeled video. The accuracy and efficiency of clockwork convnets are evaluated on the Youtube-Objects, NYUD, and Cityscapes video datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes