LGCVIVMLJun 26, 2020

A Loss Function for Generative Neural Networks Based on Watson's Perceptual Model

arXiv:2006.15057v367 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating realistic imagery in generative models for applications in computer vision and graphics, though it is incremental as it builds on existing perceptual models.

The authors tackled the problem of training Variational Autoencoders (VAEs) to generate realistic images by proposing a new loss function based on Watson's perceptual model, which accounts for human perception of image similarity. The result was that VAEs trained with this loss generated high-quality images that were less blurry than those using Euclidean distance or SSIM, and required fewer computational resources with fewer artifacts compared to deep neural network-based losses.

To train Variational Autoencoders (VAEs) to generate realistic imagery requires a loss function that reflects human perception of image similarity. We propose such a loss function based on Watson's perceptual model, which computes a weighted distance in frequency space and accounts for luminance and contrast masking. We extend the model to color images, increase its robustness to translation by using the Fourier Transform, remove artifacts due to splitting the image into blocks, and make it differentiable. In experiments, VAEs trained with the new loss function generated realistic, high-quality image samples. Compared to using the Euclidean distance and the Structural Similarity Index, the images were less blurry; compared to deep neural network based losses, the new approach required less computational resources and generated images with less artifacts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes