CVSep 11, 2024

Foundation Models Boost Low-Level Perceptual Similarity Metrics

Abhijay Ghildyal, Nabajeet Barman, Saman Zadtootaghaj

arXiv:2409.07650v210.57 citationsh-index: 18Has Code

Originality Incremental advance

AI Analysis

This work addresses image quality assessment for applications like image processing and compression, but it is incremental as it builds on existing foundation model approaches.

The paper tackled the problem of full-reference image quality assessment by exploring intermediate features of foundation models for perceptual similarity metrics, demonstrating that these features outperform traditional and state-of-the-art learned metrics without training.

For full-reference image quality assessment (FR-IQA) using deep-learning approaches, the perceptual similarity score between a distorted image and a reference image is typically computed as a distance measure between features extracted from a pretrained CNN or more recently, a Transformer network. Often, these intermediate features require further fine-tuning or processing with additional neural network layers to align the final similarity scores with human judgments. So far, most IQA models based on foundation models have primarily relied on the final layer or the embedding for the quality score estimation. In contrast, this work explores the potential of utilizing the intermediate features of these foundation models, which have largely been unexplored so far in the design of low-level perceptual similarity metrics. We demonstrate that the intermediate features are comparatively more effective. Moreover, without requiring any training, these metrics can outperform both traditional and state-of-the-art learned metrics by utilizing distance measures between the features.

View on arXiv PDF Code

Similar