CVJul 25, 2022

Exploring CLIP for Assessing the Look and Feel of Images

arXiv:2207.12396v21279 citationsh-index: 128Has Code
Originality Incremental advance
AI Analysis

This addresses the need for more abstract image perception tools without requiring laborious labeled data, though it is incremental as it adapts an existing model to a new task.

The paper tackles the problem of assessing both the quality (look) and abstract perception (feel) of images by exploring CLIP models in a zero-shot manner, showing that CLIP captures meaningful priors that generalize well to different perceptual assessments.

Measuring the perception of visual content is a long-standing problem in computer vision. Many mathematical models have been developed to evaluate the look or quality of an image. Despite the effectiveness of such tools in quantifying degradations such as noise and blurriness levels, such quantification is loosely coupled with human language. When it comes to more abstract perception about the feel of visual content, existing methods can only rely on supervised models that are explicitly trained with labeled data collected via laborious user study. In this paper, we go beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. In particular, we discuss effective prompt designs and show an effective prompt pairing strategy to harness the prior. We also provide extensive experiments on controlled datasets and Image Quality Assessment (IQA) benchmarks. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments. Code is avaliable at https://github.com/IceClear/CLIP-IQA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes