IVCVMMMay 21, 2022

Making Video Quality Assessment Models Sensitive to Frame Rate Distortions

arXiv:2205.10501v12 citationsh-index: 116
Originality Synthesis-oriented
AI Analysis

This work addresses a specific challenge in video quality assessment for streaming and variable frame rate content, representing an incremental improvement by enhancing existing models with temporal features.

The paper tackled the problem of making Video Quality Assessment (VQA) models sensitive to frame rate distortions, such as those in variable frame rate videos, by proposing a fusion framework that combines temporal features from the GREED model with existing VQA models, resulting in significantly boosted performance on both HFR/VFR and fixed frame rate datasets.

We consider the problem of capturing distortions arising from changes in frame rate as part of Video Quality Assessment (VQA). Variable frame rate (VFR) videos have become much more common, and streamed videos commonly range from 30 frames per second (fps) up to 120 fps. VFR-VQA offers unique challenges in terms of distortion types as well as in making non-uniform comparisons of reference and distorted videos having different frame rates. The majority of current VQA models require compared videos to be of the same frame rate, but are unable to adequately account for frame rate artifacts. The recently proposed Generalized Entropic Difference (GREED) VQA model succeeds at this task, using natural video statistics models of entropic differences of temporal band-pass coefficients, delivering superior performance on predicting video quality changes arising from frame rate distortions. Here we propose a simple fusion framework, whereby temporal features from GREED are combined with existing VQA models, towards improving model sensitivity towards frame rate distortions. We find through extensive experiments that this feature fusion significantly boosts model performance on both HFR/VFR datasets as well as fixed frame rate (FFR) VQA databases. Our results suggest that employing efficient temporal representations can result much more robust and accurate VQA models when frame rate variations can occur.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes