Paula Daudén-Oliver

h-index27

3papers

6citations

Novelty13%

AI Score24

Ranked #178,611 of 201,326 authors (top 89%)#54,097 in CV (top 92%)

3 Papers

CVJul 25, 2024

Assessing invariance to affine transformations in image quality metrics

Nuria Alabau-Bosque, Paula Daudén-Oliver, Jorge Vila-Tomás et al.

Subjective image quality metrics are usually evaluated according to the correlation with human opinion in databases with distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the digital ones. In this work, we propose a methodology to evaluate any image quality metric by assessing their invariance to affine transformations, specifically: rotation, translation, scaling, and changes in spectral illumination. Here, invariance refers to the fact that certain distances should be neglected if their values are below a threshold. This is what we call invisibility threshold of a metric. Our methodology consists of two elements: (1) the determination of a visibility threshold in a subjective representation common to every metric, and (2) a transduction from the distance values of the metric and this common representation. This common representation is based on subjective ratings of readily available image quality databases. We determine the threshold in such common representation (the first element) using accurate psychophysics. Then, the transduction (the second element) can be trivially fitted for any metric: with the provided threshold extension of the method to any metric is straightforward. We test our methodology with some well-established metrics and find that none of them show human-like invisibility thresholds. This means that tuning the models exclusively to predict the visibility of generic distortions may disregard other properties of human vision as for instance invariances or invisibility thresholds. The data and code are publicly available to test other metrics.

CVSep 2, 2025

Hues and Cues: Human vs. CLIP

Nuria Alabau-Bosque, Jorge Vila-Tomás, Paula Daudén-Oliver et al.

Playing games is inherently human, and a lot of games are created to challenge different human characteristics. However, these tasks are often left out when evaluating the human-like nature of artificial models. The objective of this work is proposing a new approach to evaluate artificial models via board games. To this effect, we test the color perception and color naming capabilities of CLIP by playing the board game Hues & Cues and assess its alignment with humans. Our experiments show that CLIP is generally well aligned with human observers, but our approach brings to light certain cultural biases and inconsistencies when dealing with different abstraction levels that are hard to identify with other testing strategies. Our findings indicate that assessing models with different tasks like board games can make certain deficiencies in the models stand out in ways that are difficult to test with the commonly used benchmarks.

CVDec 13, 2024

RAID-Database: human Responses to Affine Image Distortions

Paula Daudén-Oliver, David Agost-Beltran, Emilio Sansano-Sansano et al.

Image quality databases are used to train models for predicting subjective human perception. However, most existing databases focus on distortions commonly found in digital media and not in natural conditions. Affine transformations are particularly relevant to study, as they are among the most commonly encountered by human observers in everyday life. This Data Descriptor presents a set of human responses to suprathreshold affine image transforms (rotation, translation, scaling) and Gaussian noise as convenient reference to compare with previously existing image quality databases. The responses were measured using well established psychophysics: the Maximum Likelihood Difference Scaling method. The set contains responses to 864 distorted images. The experiments involved 105 observers and more than 20000 comparisons of quadruples of images. The quality of the dataset is ensured because (a) it reproduces the classical Piéron's law, (b) it reproduces classical absolute detection thresholds, and (c) it is consistent with conventional image quality databases but improves them according to Group-MAD experiments.