CVGRHCLGMar 31, 2021

Video-Specific Autoencoders for Exploring, Editing and Transmitting Videos

arXiv:2103.17261v23 citations
AI Analysis

This work addresses video processing tasks for users needing interactive manipulation and efficient transmission, but it is incremental as it builds on prior autoencoder formulations for these sub-problems.

The paper tackles the problem of enabling human users to explore, edit, and transmit videos efficiently by training a simple autoencoder from scratch on multiple frames of a specific video, resulting in latent codes that capture spatial and temporal properties and allow operations like visualization, editing, and sparse low-res frame transmission.

We study video-specific autoencoders that allow a human user to explore, edit, and efficiently transmit videos. Prior work has independently looked at these problems (and sub-problems) and proposed different formulations. In this work, we train a simple autoencoder (from scratch) on multiple frames of a specific video. We observe: (1) latent codes learned by a video-specific autoencoder capture spatial and temporal properties of that video; and (2) autoencoders can project out-of-sample inputs onto the video-specific manifold. These two properties allow us to explore, edit, and efficiently transmit a video using one learned representation. For e.g., linear operations on latent codes allow users to visualize the contents of a video. Associating latent codes of a video and manifold projection enables users to make desired edits. Interpolating latent codes and manifold projection allows the transmission of sparse low-res frames over a network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes