ASLGSDApr 7, 2021

Utilizing Self-supervised Representations for MOS Prediction

arXiv:2104.03017v374 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient and accurate speech quality evaluation in speech processing, particularly for large-scale data where subjective tests are costly, though it is incremental as it builds on existing self-supervised methods.

The paper tackled the problem of automatic speech quality assessment without requiring clean reference data by using self-supervised pre-trained models for MOS prediction, achieving significant improvement over previous state-of-the-art models on Voice Conversion Challenge 2018 and comparable or superior performance on Voice Conversion Challenge 2016.

Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception. However, such a test is expensive and time-consuming because crowd work is necessary. It thus becomes highly desired to develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data. In this paper, we use self-supervised pre-trained models for MOS prediction. We show their representations can distinguish between clean and noisy audios. Then, we fine-tune these pre-trained models followed by simple linear layers in an end-to-end manner. The experiment results showed that our framework outperforms the two previous state-of-the-art models by a significant improvement on Voice Conversion Challenge 2018 and achieves comparable or superior performance on Voice Conversion Challenge 2016. We also conducted an ablation study to further investigate how each module benefits the task. The experiment results are implemented and reproducible with publicly available toolkits.

Code Implementations7 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes