MMSep 28, 2020

Describing Subjective Experiment Consistency by $p$-Value P-P Plot

arXiv:2009.13372v1
Originality Incremental advance
AI Analysis

This addresses the need for researchers to trust subjective data in fields like multimedia quality assessment, though it is incremental as it builds on existing statistical methods.

The researchers tackled the problem of inconsistent subjective testing results by developing a tool that classifies experiments as consistent or inconsistent and identifies irregular stimuli, based on a discrete Generalized Score Distribution and bootstrapped G-test, which aligned with expectations from 21 real-life multimedia quality experiments.

There are phenomena that cannot be measured without subjective testing. However, subjective testing is a complex issue with many influencing factors. These interplay to yield either precise or incorrect results. Researchers require a tool to classify results of subjective experiment as either consistent or inconsistent. This is necessary in order to decide whether to treat the gathered scores as quality ground truth data. Knowing if subjective scores can be trusted is key to drawing valid conclusions and building functional tools based on those scores (e.g., algorithms assessing the perceived quality of multimedia materials). We provide a tool to classify subjective experiment (and all its results) as either consistent or inconsistent. Additionally, the tool identifies stimuli having irregular score distribution. The approach is based on treating subjective scores as a random variable coming from the discrete Generalized Score Distribution (GSD). The GSD, in combination with a bootstrapped G-test of goodness-of-fit, allows to construct $p$-value P-P plot that visualizes experiment's consistency. The tool safeguards researchers from using inconsistent subjective data. In this way, it makes sure that conclusions they draw and tools they build are more precise and trustworthy. The proposed approach works in line with expectations drawn solely on experiment design descriptions of 21 real-life multimedia quality subjective experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes