CVAug 12, 2021

MUSIQ: Multi-scale Image Quality Transformer

arXiv:2108.05997v11411 citations
Originality Incremental advance
AI Analysis

This work improves IQA for applications in visual experience enhancement, though it is incremental as it adapts Transformer methods to a specific domain bottleneck.

The paper tackled the problem of image quality assessment (IQA) by addressing the fixed shape constraint in CNN-based methods, which degrades quality due to resizing and cropping, and proposed MUSIQ, a multi-scale Transformer that processes native resolution images, achieving state-of-the-art performance on datasets like PaQ-2-PiQ, SPAQ, and KonIQ-10k.

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes