CL CV LGDec 24, 2013

Speech Recognition Front End Without Information Loss

Matthew Ager, Zoran Cvetkovic, Peter Sollich

arXiv:1312.6849v22 citations

AI Analysis

This work addresses robust speech recognition for noisy environments, offering an incremental improvement by combining high-dimensional and traditional features.

The paper tackled the problem of improving automatic speech recognition robustness to additive noise by using high-dimensional linear feature domains from acoustic waveforms, resulting in better phoneme classification and recognition performance than PLP and MFCC classifiers below 18 dB SNR, and a combined approach outperforming individual representations across all noise levels.

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The motivation behind this approach is twofold: (i) the information in acoustic waveforms that is usually removed in the process of extracting low-dimensional features might aid robust recognition by virtue of structured redundancy analogous to channel coding, (ii) linear feature domains allow for exact noise adaptation, as opposed to representations that involve non-linear processing which makes noise adaptation challenging. Thus, we develop a generative framework for phoneme modelling in high-dimensional linear feature domains, and use it in phoneme classification and recognition tasks. Results show that classification and recognition in this framework perform better than analogous PLP and MFCC classifiers below 18 dB SNR. A combination of the high-dimensional and MFCC features at the likelihood level performs uniformly better than either of the individual representations across all noise levels.

View on arXiv PDF

Similar