WEnets: A Convolutional Framework for Evaluating Audio Waveforms
This work addresses audio quality assessment for applications like telecommunications, but it is incremental as it adapts existing convolutional methods to a specific domain.
The authors tackled the problem of evaluating audio waveforms by introducing WEnets, a convolutional framework, and developed NAWEnet, a single-ended network that emulates PESQ, POLQA, and STOI with testing correlations of 0.95, 0.92, and 0.95, respectively, using only 50% of data for training.
We describe a new convolutional framework for waveform evaluation, WEnets, and build a Narrowband Audio Waveform Evaluation Network, or NAWEnet, using this framework. NAWEnet is single-ended (or no-reference) and was trained three separate times in order to emulate PESQ, POLQA, or STOI with testing correlations 0.95, 0.92, and 0.95, respectively when training on only 50% of available data and testing on 40%. Stacks of 1-D convolutional layers and non-linear downsampling learn which features are important for quality or intelligibility estimation. This straightforward architecture simplifies the interpretation of its inner workings and paves the way for future investigations into higher sample rates and accurate no-reference subjective speech quality predictions.