Alexander Raake

CV
h-index116
8papers
55citations
Novelty23%
AI Score39

8 Papers

61.6IVMay 25Code
How Accurate are Video Quality Models for Diffusion-Based Video Super-Resolution?

Benjamin Herb, Steve Göring, Alexander Raake et al.

Recent video super-resolution (VSR) approaches use deep neural networks to enhance low-quality input videos and recover visual detail, with diffusion-based methods in particular showing promising results. In this paper, we investigate whether existing video quality models can be used to assess the performance of these diffusion-based VSR methods, by comparing model predictions with results from a subjective test. The study compares six upscaling methods (Lanczos, Rhea, SCST, DOVE, SeedVR2, Starlight Mini) applied to both compressed (AV1 and DCVC-RT) and uncompressed low-resolution videos considering the play-out on a UHD-1/4K screen. A range of full- and no-reference quality models are used to assess their applicability to this new type of quality degradation, focusing on within-sequence performance. The results highlight that CNN-based full-reference models, such as LPIPS, DISTS, and CVQA-FR show significantly higher correlation coefficients than both conventional full- as well as the tested no-reference models. Most overestimate the overly sharp results of SCST, with VMAF mainly failing due to spatial inconsistencies introduced by Starlight Mini. None of the tested video quality models reach sufficient accuracy so as to replace complementary subjective testing. The reference, degraded and upscaled videos, as well as the user ratings and model scores are made available with the paper at https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-VSR as open data.

CVJun 14, 2025Code
Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity

Mohsen Jenadeleh, Jon Sneyers, Davi Lazzarotto et al.

High dynamic range (HDR) and wide color gamut (WCG) technologies significantly improve color reproduction compared to standard dynamic range (SDR) and standard color gamuts, resulting in more accurate, richer, and more immersive images. However, HDR increases data demands, posing challenges for bandwidth efficiency and compression techniques. Advances in compression and display technologies require more precise image quality assessment, particularly in the high-fidelity range where perceptual differences are subtle. To address this gap, we introduce AIC-HDR2025, the first such HDR dataset, comprising 100 test images generated from five HDR sources, each compressed using four codecs at five compression levels. It covers the high-fidelity range, from visible distortions to compression levels below the visually lossless threshold. A subjective study was conducted using the JPEG AIC-3 test methodology, combining plain and boosted triplet comparisons. In total, 34,560 ratings were collected from 151 participants across four fully controlled labs. The results confirm that AIC-3 enables precise HDR quality estimation, with 95\% confidence intervals averaging a width of 0.27 at 1 JND. In addition, several recently proposed objective metrics were evaluated based on their correlation with subjective ratings. The dataset is publicly available.

GRFeb 19, 2025Code
Appeal prediction for AI up-scaled Images

Steve Göring, Rasmus Merten, Alexander Raake

DNN- or AI-based up-scaling algorithms are gaining in popularity due to the improvements in machine learning. Various up-scaling models using CNNs, GANs or mixed approaches have been published. The majority of models are evaluated using PSRN and SSIM or only a few example images. However, a performance evaluation with a wide range of real-world images and subjective evaluation is missing, which we tackle in the following paper. For this reason, we describe our developed dataset, which uses 136 base images and five different up-scaling methods, namely Real-ESRGAN, BSRGAN, waifu2x, KXNet, and Lanczos. Overall the dataset consists of 1496 annotated images. The labeling of our dataset focused on image appeal and has been performed using crowd-sourcing employing our open-source tool AVRate Voyager. We evaluate the appeal of the different methods, and the results indicate that Real-ESRGAN and BSRGAN are the best. Furthermore, we train a DNN to detect which up-scaling method has been used, the trained models have a good overall performance in our evaluation. In addition to this, we evaluate state-of-the-art image appeal and quality models, here none of the models showed a high prediction performance, therefore we also trained two own approaches. The first uses transfer learning and has the best performance, and the second model uses signal-based features and a random forest model with good overall performance. We share the data and implementation to allow further research in the context of open science.

CVOct 17, 2024
Satellite Streaming Video QoE Prediction: A Real-World Subjective Database and Network-Level Prediction Models

Bowen Chen, Zaixi Shang, Jae Won Chung et al.

Demand for streaming services, including satellite, continues to exhibit unprecedented growth. Internet Service Providers find themselves at the crossroads of technological advancements and rising customer expectations. To stay relevant and competitive, these ISPs must ensure their networks deliver optimal video streaming quality, a key determinant of user satisfaction. Towards this end, it is important to have accurate Quality of Experience prediction models in place. However, achieving robust performance by these models requires extensive data sets labeled by subjective opinion scores on videos impaired by diverse playback disruptions. To bridge this data gap, we introduce the LIVE-Viasat Real-World Satellite QoE Database. This database consists of 179 videos recorded from real-world streaming services affected by various authentic distortion patterns. We also conducted a comprehensive subjective study involving 54 participants, who contributed both continuous-time opinion scores and endpoint (retrospective) QoE scores. Our analysis sheds light on various determinants influencing subjective QoE, such as stall events, spatial resolutions, bitrate, and certain network parameters. We demonstrate the usefulness of this unique new resource by evaluating the efficacy of prevalent QoE-prediction models on it. We also created a new model that maps the network parameters to predicted human perception scores, which can be used by ISPs to optimize the video streaming quality of their networks. Our proposed model, which we call SatQA, is able to accurately predict QoE using only network parameters, without any access to pixel data or video-specific metadata, estimated by Spearman's Rank Order Correlation Coefficient (SROCC), Pearson Linear Correlation Coefficient (PLCC), and Root Mean Squared Error (RMSE), indicating high accuracy and reliability.

HCFeb 3, 2022
Technological Factors Influencing Videoconferencing and Zoom Fatigue

Alexander Raake, Markus Fiedler, Katrin Schoenenberg et al.

The paper presents a conceptual, multidimensional approach to understand the technological factors that are assumed to or even have been proven to contribute to what has been coined as Zoom Fatigue (ZF) or more generally Videoconferencing Fatigue (VCF). With the advent of the Covid-19 pandemic, the usage of VC services has drastically increased, leading to more and more reports about the ZF or VCF phenomenon. The paper is motivated by the fact that some of the media outlets initially starting the debate on what Zoom fatigue is and how it can be avoided, as well as some of the scientific papers addressing the topic, contain assumptions that are rather hypothetical and insufficiently underpinned by scientific evidence. Most of these works are acknowledge the lacking evidence and partly suggest directions for future research. This paper intends to deepen the survey of VC-technology-related literature and to provide more existing evidence, where possible, while reviewing some of the already provided support or evidence for certain causal hypotheses. The technological factors dimension and its identified sub-dimensions presented in this paper are embedded within a more holistic four-dimensional conceptual factors model describing the causes for ZF or VCF. The paper describing this overall conceptual model is written by the same group of authors and currently under revision for an Open Access Journal publication. The present paper expands on the technological factors dimension descriptions provided in the overall model paper and provides more detailed analyzes and concepts associated with how VC technology may affect users' perception, cognitive load, interaction and communication, possibly leading to stress, exhaustion and fatigue. The paper currently is a living document which will be expanded further with regard to the evidence for or against the impact of certain technological factors.

CVJun 13, 2020
Semantic-driven Colorization

Man M. Ho, Lu Zhang, Alexander Raake et al.

Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. As a human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have seen in real life, and finally colorize them, as described in the teaser. In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it. Thus, our work can provide plausible colors at a semantic level. Plus, the semantic information of the learned model becomes understandable and able to interact. Additionally, we also prove that Instance Normalization is also a missing ingredient for colorization, then re-design the inference flow of U-Net to have two streams of data, providing an appropriate way of normalizing the feature maps from the black-and-white image and its semantic map. As a result, our network can provide plausible colors competitive to the typical colorization works for specific objects.

MMJun 10, 2020
QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)

Andrew Perkis, Christian Timmerer, Sabina Baraković et al.

With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions, scope, and constituents that are required to be addressed so that a coherent understanding of the concepts can be achieved. Such consensus is vital for paving the directionality of the future of immersive media experiences (IMEx) and all related matters. The aim of this white paper is to provide a survey of definitions of immersion and presence which leads to a definition of immersive media experience (IMEx). The Quality of Experience (QoE) for immersive media is described by establishing a relationship between the concepts of QoE and IMEx followed by application areas of immersive media experience. Influencing factors on immersive media experience are elaborated as well as the assessment of immersive media experience. Finally, standardization activities related to IMEx are highlighted and the white paper is concluded with an outlook related to future developments.

CRJun 12, 2015
Because we care: Privacy Dashboard on Firefox OS

Marta Piekarska, Yun Zhou, Dominik Strohmeier et al.

In this paper we present the Privacy Dashboard -- a tool designed to inform and empower the people using mobile devices, by introducing features such as Remote Privacy Protection, Backup, Adjustable Location Accuracy, Permission Control and Secondary-User Mode. We have implemented our solution on FirefoxOS and conducted user studies to verify the usefulness and usability of our tool. The paper starts with a discussion of different aspects of mobile privacy, how users perceive it and how much they are willing to give up for better usability. Then we describe the tool in detail, presenting what incentives drove us to certain design decisions. During our studies we tried to understand how users interact with the system and what are their priorities. We have verified our hypothesis, and the impact of the educational aspects on the decisions about the privacy settings. We show that by taking a user-centric development of privacy extensions we can reduce the gap between protection and usability.