ASSDJun 16, 2021

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

arXiv:2106.08536v1
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in automatic assessment of speech sound disorders for clinical applications, representing an incremental improvement over prior methods.

The paper tackled the problem of detecting consonant errors in disordered speech, particularly for short and transitory consonants, by using consonant-vowel segment embeddings instead of consonant-only segments, resulting in improved performance for these difficult cases.

Speech sound disorder (SSD) refers to a type of developmental disorder in young children who encounter persistent difficulties in producing certain speech sounds at the expected age. Consonant errors are the major indicator of SSD in clinical assessment. Previous studies on automatic assessment of SSD revealed that detection of speech errors concerning short and transitory consonants is less satisfactory. This paper investigates a neural network based approach to detecting consonant errors in disordered speech using consonant-vowel (CV) diphone segment in comparison to using consonant monophone segment. The underlying assumption is that the vowel part of a CV segment carries important information of co-articulation from the consonant. Speech embeddings are extracted from CV segments by a recurrent neural network model. The similarity scores between the embeddings of the test segment and the reference segments are computed to determine if the test segment is the expected consonant or not. Experimental results show that using CV segments achieves improved performance on detecting speech errors concerning those "difficult" consonants reported in the previous studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes