CVMMDec 23, 2024

Predicting Satisfied User and Machine Ratio for Compressed Images: A Unified Approach

arXiv:2412.17477v11 citationsh-index: 57
Originality Incremental advance
AI Analysis

This addresses the need for optimizing image compression to balance quality for human viewing and machine analysis, though it is incremental as it builds on existing prediction tasks.

The paper tackles the problem of predicting perceptual quality for compressed images for both humans and machines, proposing a unified deep learning model that simultaneously predicts Satisfied User Ratio (SUR) and Satisfied Machine Ratio (SMR), and it significantly outperforms state-of-the-art methods.

Nowadays, high-quality images are pursued by both humans for better viewing experience and by machines for more accurate visual analysis. However, images are usually compressed before being consumed, decreasing their quality. It is meaningful to predict the perceptual quality of compressed images for both humans and machines, which guides the optimization for compression. In this paper, we propose a unified approach to address this. Specifically, we create a deep learning-based model to predict Satisfied User Ratio (SUR) and Satisfied Machine Ratio (SMR) of compressed images simultaneously. We first pre-train a feature extractor network on a large-scale SMR-annotated dataset with human perception-related quality labels generated by diverse image quality models, which simulates the acquisition of SUR labels. Then, we propose an MLP-Mixer-based network to predict SUR and SMR by leveraging and fusing the extracted multi-layer features. We introduce a Difference Feature Residual Learning (DFRL) module to learn more discriminative difference features. We further use a Multi-Head Attention Aggregation and Pooling (MHAAP) layer to aggregate difference features and reduce their redundancy. Experimental results indicate that the proposed model significantly outperforms state-of-the-art SUR and SMR prediction methods. Moreover, our joint learning scheme of human and machine perceptual quality prediction tasks is effective at improving the performance of both.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes