CVGRFeb 5, 2021

Sampling Based Scene-Space Video Processing

arXiv:2102.03011v144 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of robust scene-space video processing for users dealing with imperfect depth and camera pose data, offering a more resilient approach to applying various video effects.

This paper introduces a sampling-based framework for video processing that enables high-quality scene-space effects despite inaccuracies in depth and camera pose estimation. It leverages the redundancy of approximate scene information across multiple video frames to gather and filter pixel samples, achieving practical runtimes on HD video.

Many compelling video processing effects can be achieved if per-pixel depth information and 3D camera calibrations are known. However, the success of such methods is highly dependent on the accuracy of this "scene-space" information. We present a novel, sampling-based framework for processing video that enables high-quality scene-space video effects in the presence of inevitable errors in depth and camera pose estimation. Instead of trying to improve the explicit 3D scene representation, the key idea of our method is to exploit the high redundancy of approximate scene information that arises due to most scene points being visible multiple times across many frames of video. Based on this observation, we propose a novel pixel gathering and filtering approach. The gathering step is general and collects pixel samples in scene-space, while the filtering step is application-specific and computes a desired output video from the gathered sample sets. Our approach is easily parallelizable and has been implemented on GPU, allowing us to take full advantage of large volumes of video data and facilitating practical runtimes on HD video using a standard desktop computer. Our generic scene-space formulation is able to comprehensively describe a multitude of video processing applications such as denoising, deblurring, super resolution, object removal, computational shutter functions, and other scene-space camera effects. We present results for various casually captured, hand-held, moving, compressed, monocular videos depicting challenging scenes recorded in uncontrolled environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes