ASSDApr 29, 2020

Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording Environments

arXiv:2004.14859v1
AI Analysis

This work addresses phone-level mis-articulation assessment for speech therapy or diagnostics in noisy, uncontrolled mobile recordings, but it is incremental as it builds on existing spectral transition measures.

The paper tackled robust phone segmentation for articulation error assessment in non-standard recording environments like mobile devices, achieving a 7% improvement on TIMIT and 10% on Hindi data over baseline methods.

Phone level localization of mis-articulation is a key requirement for an automatic articulation error assessment system. A robust phone segmentation technique is essential to aid in real-time assessment of phone level mis-articulations of speech, wherein the audio is recorded on mobile phones or tablets. This is a non-standard recording set-up with little control over the quality of recording. We propose a novel post processing technique to aid Spectral Transition Measure(STM)-based phone segmentation under noisy conditions such as environment noise and clipping, commonly present during a mobile phone recording. A comparison of the performance of our approach and phone segmentation using traditional MFCC and PLPCC speech features for Gaussian noise and clipping is shown. The proposed approach was validated on TIMIT and Hindi speech corpus and was used to compute phone boundaries for a set of speech, recorded simultaneously on three devices - a laptop, a stationarily placed tablet and a handheld mobile phone, to simulate different audio qualities in a real-time non-standard recording environment. F-ratio was the metric used to compute the accuracy in phone boundary marking. Experimental results show an improvement of 7% for TIMIT and 10% for Hindi data over the baseline approach. Similar results were seen for the set of three of recordings collected in-house.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes