GR CVJul 10, 2017

Deep Bilateral Learning for Real-Time Image Enhancement

Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, Frédo Durand

arXiv:1707.02880v237.9828 citationsh-index: 94Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of performance and flexibility in mobile image processing for applications like real-time viewfinders and photographic edits, though it is incremental as it builds on bilateral grid processing and local affine transforms.

The paper tackles the challenge of real-time image enhancement on mobile devices by introducing a neural network architecture that learns to approximate complex image transformations from input-output pairs, enabling processing of high-resolution images in milliseconds and matching state-of-the-art quality without requiring the original operator at runtime.

Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.

View on arXiv PDF Code

Similar