CVApr 11, 2022

Human vs Objective Evaluation of Colourisation Performance

arXiv:2204.05200v11.47 citationsh-index: 30Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of costly human evaluation in colorization for researchers and practitioners, but it is incremental as it builds on existing datasets and methods.

The paper tackled the problem of evaluating automatic colorization methods by assessing how well objective measures correlate with human opinion, finding statistically significant but low-strength correlations and identifying hue errors in natural objects as most critical to human perception.

Automatic colourisation of grey-scale images is the process of creating a full-colour image from the grey-scale prior. It is an ill-posed problem, as there are many plausible colourisations for a given grey-scale prior. The current SOTA in auto-colourisation involves image-to-image type Deep Convolutional Neural Networks with Generative Adversarial Networks showing the greatest promise. The end goal of colourisation is to produce full colour images that appear plausible to the human viewer, but human assessment is costly and time consuming. This work assesses how well commonly used objective measures correlate with human opinion. We also attempt to determine what facets of colourisation have the most significant effect on human opinion. For each of 20 images from the BSD dataset, we create 65 recolourisations made up of local and global changes. Opinion scores are then crowd sourced using the Amazon Mechanical Turk and together with the images this forms an extensible dataset called the Human Evaluated Colourisation Dataset (HECD). While we find statistically significant correlations between human-opinion scores and a small number of objective measures, the strength of the correlations is low. There is also evidence that human observers are most intolerant to an incorrect hue of naturally occurring objects.

View on arXiv PDF Code

Similar