AS SDAug 19, 2021

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

Alessandro Ragano, Emmanouil Benetos, Andrew Hines

arXiv:2108.08745v15.116 citationsh-index: 31

Originality Incremental advance

AI Analysis

This work addresses the challenge of data scarcity in speech quality assessment for multimedia applications, offering an incremental improvement over existing methods.

The paper tackled the problem of non-intrusive speech quality assessment with limited annotated data by proposing two multi-task models that leverage unlabeled data for feature learning, resulting in the deep clustering-based model outperforming baselines on the TCD-VoIP dataset.

Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment metrics. In this paper, we propose two multi-task models to tackle the problems above. In the first model, we first learn a feature representation with a degradation classifier on a large dataset. Then we perform MOS prediction and degradation classification simultaneously on a small dataset annotated with MOS. In the second approach, the initial stage consists of learning features with a deep clustering-based unsupervised feature representation on the large dataset. Next, we perform MOS prediction and cluster label classification simultaneously on a small dataset. The results show that the deep clustering-based model outperforms the degradation classifier-based model and the 3 baselines (autoencoder features, P.563, and SRMRnorm) on TCD-VoIP. This paper indicates that multi-task learning combined with feature representations from unlabelled data is a promising approach to deal with the lack of large MOS annotated datasets.

View on arXiv PDF

Similar