CVDec 4, 2018

The Visual Centrifuge: Model-Free Layered Video Representations

arXiv:1812.01461v251 citations
AI Analysis

This addresses video understanding for complex scenes with multiple mediums, but it appears incremental as it builds on existing layered representation ideas with new architectures.

The paper tackles the problem of modeling non-lambertian scenes in videos by proposing a learning-based approach for multi-layered video representation, achieving abilities like color constancy and reflection separation on real-world videos.

True video understanding requires making sense of non-lambertian scenes where the color of light arriving at the camera sensor encodes information about not just the last object it collided with, but about multiple mediums -- colored windows, dirty mirrors, smoke or rain. Layered video representations have the potential of accurately modelling realistic scenes but have so far required stringent assumptions on motion, lighting and shape. Here we propose a learning-based approach for multi-layered video representation: we introduce novel uncertainty-capturing 3D convolutional architectures and train them to separate blended videos. We show that these models then generalize to single videos, where they exhibit interesting abilities: color constancy, factoring out shadows and separating reflections. We present quantitative and qualitative results on real world videos.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes