CVMar 18, 2021

Generic Perceptual Loss for Modeling Structured Output Dependencies

arXiv:2103.10571v141 citations
Originality Highly original
AI Analysis

This work provides a generic structured-output loss that removes pre-training requirements, potentially benefiting a wide range of structured output learning tasks in computer vision.

The authors tackled the problem of structured output dependencies in image synthesis by revealing that the network structure, not pre-trained weights, is key to perceptual loss, and demonstrated improved results on semantic segmentation, depth estimation, and instance segmentation tasks.

The perceptual loss has been widely used as an effective loss term in image synthesis tasks including image super-resolution, and style transfer. It was believed that the success lies in the high-level perceptual feature representations extracted from CNNs pretrained with a large set of images. Here we reveal that, what matters is the network structure instead of the trained weights. Without any learning, the structure of a deep network is sufficient to capture the dependencies between multiple levels of variable statistics using multiple layers of CNNs. This insight removes the requirements of pre-training and a particular network structure (commonly, VGG) that are previously assumed for the perceptual loss, thus enabling a significantly wider range of applications. To this end, we demonstrate that a randomly-weighted deep CNN can be used to model the structured dependencies of outputs. On a few dense per-pixel prediction tasks such as semantic segmentation, depth estimation and instance segmentation, we show improved results of using the extended randomized perceptual loss, compared to the baselines using pixel-wise loss alone. We hope that this simple, extended perceptual loss may serve as a generic structured-output loss that is applicable to most structured output learning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes