IV CVDec 27, 2020

Learning Generalized Spatial-Temporal Deep Feature Representation for No-Reference Video Quality Assessment

Baoliang Chen, Lingyu Zhu, Guo Li, Hongfei Fan, Shiqi Wang

arXiv:2012.13936v225.3109 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of generalized video quality assessment for users, particularly in varied content, resolution, and frame rate scenarios, representing an incremental improvement over existing methods.

This paper proposes a no-reference video quality assessment method that learns spatial-temporal feature representations. By imposing Gaussian distribution constraints on spatial features to reduce domain gaps and using a pyramid temporal aggregation module for frame-level quality, the method achieves superior performance in cross-dataset settings and comparable results in intra-dataset configurations.

In this work, we propose a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction. In particular, we evaluate the quality of a video by learning effective feature representations in spatial-temporal domain. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in a more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method.

View on arXiv PDF Code

Similar