IVLGMMSep 8, 2025

Robustness and accuracy of mean opinion scores with hard and soft outlier detection

arXiv:2509.06554v1h-index: 4QoMEX
Originality Incremental advance
AI Analysis

This work addresses the problem of unreliable outlier detection in mean opinion scores for researchers and practitioners in multimedia quality assessment, though it is incremental as it builds on existing methods with a new evaluation framework.

The paper tackled the lack of a reliable approach for comparing outlier detection methods in subjective image and video quality assessment by proposing an empirical worst-case analysis using adversarial attacks, showing differing performance among methods and introducing two new low-complexity methods with excellent worst-case performance.

In subjective assessment of image and video quality, observers rate or compare selected stimuli. Before calculating the mean opinion scores (MOS) for these stimuli from the ratings, it is recommended to identify and deal with outliers that may have given unreliable ratings. Several methods are available for this purpose, some of which have been standardized. These methods are typically based on statistics and sometimes tested by introducing synthetic ratings from artificial outliers, such as random clickers. However, a reliable and comprehensive approach is lacking for comparative performance analysis of outlier detection methods. To fill this gap, this work proposes and applies an empirical worst-case analysis as a general solution. Our method involves evolutionary optimization of an adversarial black-box attack on outlier detection algorithms, where the adversary maximizes the distortion of scale values with respect to ground truth. We apply our analysis to several hard and soft outlier detection methods for absolute category ratings and show their differing performance in this stress test. In addition, we propose two new outlier detection methods with low complexity and excellent worst-case performance. Software for adversarial attacks and data analysis is available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes