ASCLLGMMSDJul 31, 2020

A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals

arXiv:2007.15797v122 citations
Originality Incremental advance
AI Analysis

This work addresses the lack of accurate speech quality assessment tools for real-world applications, such as telecommunications or audio processing, by providing a model trained on a new crowdsourced dataset, though it is incremental in improving upon prior methods.

The paper tackles the problem of predicting human perceptual quality ratings for real-world speech signals, which existing objective measures fail to accurately assess due to reliance on simulated data and weak correlations with subjective ratings. The result is a novel pyramid bidirectional LSTM network with attention that achieves statistically lower estimation errors and strong correlation with human judgments.

The real-world capabilities of objective speech quality measures are limited since current measures (1) are developed from simulated data that does not adequately model real environments; or they (2) predict objective scores that are not always strongly correlated with subjective ratings. Additionally, a large dataset of real-world signals with listener quality ratings does not currently exist, which would help facilitate real-world assessment. In this paper, we collect and predict the perceptual quality of real-world speech signals that are evaluated by human listeners. We first collect a large quality rating dataset by conducting crowdsourced listening studies on two real-world corpora. We further develop a novel approach that predicts human quality ratings using a pyramid bidirectional long short term memory (pBLSTM) network with an attention mechanism. The results show that the proposed model achieves statistically lower estimation errors than prior assessment approaches, where the predicted scores strongly correlate with human judgments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes