ASSDMar 26, 2020

Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders

arXiv:2003.11882v1
AI Analysis

This work addresses the problem of accurately evaluating speech quality for low bit rate vocoders, which is incremental as it builds on existing metrics.

The study compared traditional and neural-based low bit rate vocoders, finding that existing full reference speech quality metrics poorly correlate with subjective assessments, and aims to develop a new metric for generative-model-based coders.

This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are contrasted. Performance analysis of the coded speech is evaluated for different quality aspects: accuracy of pitch periods estimation, the word error rates for automatic speech recognition, and the influence of speaker gender and coding delays. A number of performance metrics of speech samples taken from a publicly available database were compared with subjective scores. Results from subjective quality assessment do not correlate well with existing full reference speech quality metrics. The results provide valuable insights into aspects of the speech signal that will be used to develop a novel metric to accurately predict speech quality from generative-model-based coders.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes