CVAILGFeb 8, 2021

Colorization Transformer

arXiv:2102.04432v2193 citationsHas Code
AI Analysis

This paper addresses the problem of generating diverse and high-fidelity colorizations for grayscale images, offering significant improvements for image processing and content creation.

The Colorization Transformer is a new self-attention-based method for diverse, high-fidelity image colorization. It generates a low-resolution coarse coloring using a conditional autoregressive transformer, then upsamples it with two parallel networks. The method outperforms previous state-of-the-art on ImageNet based on FID scores, and human evaluators preferred its highest-rated colorings over ground truth in over 60% of cases.

We present the Colorization Transformer, a novel approach for diverse high fidelity image colorization based on self-attention. Given a grayscale image, the colorization proceeds in three steps. We first use a conditional autoregressive transformer to produce a low resolution coarse coloring of the grayscale image. Our architecture adopts conditional transformer layers to effectively condition grayscale input. Two subsequent fully parallel networks upsample the coarse colored low resolution image into a finely colored high resolution image. Sampling from the Colorization Transformer produces diverse colorings whose fidelity outperforms the previous state-of-the-art on colorising ImageNet based on FID results and based on a human evaluation in a Mechanical Turk test. Remarkably, in more than 60% of cases human evaluators prefer the highest rated among three generated colorings over the ground truth. The code and pre-trained checkpoints for Colorization Transformer are publicly available at https://github.com/google-research/google-research/tree/master/coltran

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes