AS CL CV LG SD IVJul 1, 2019

Speaker-independent classification of phonetic segments from raw ultrasound in child speech

Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

arXiv:1907.01413v18.622 citations

Originality Incremental advance

AI Analysis

This work addresses a key bottleneck in speech therapy by enabling more efficient processing of ultrasound data, though it is incremental as it builds on existing methods with speaker adaptation.

The study tackled the challenge of generalizing automatic classification of phonetic segments from raw ultrasound tongue images to unseen speakers, finding that models underperform without speaker-specific data but improve with minimal additional speaker information like the mean ultrasound frame.

Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production. UTI is increasingly being used for speech therapy, making it important to develop automatic methods to assist various time-consuming manual tasks currently performed by speech therapists. A key challenge is to generalize the automatic processing of ultrasound tongue images to previously unseen speakers. In this work, we investigate the classification of phonetic segments (tongue shapes) from raw ultrasound recordings under several training scenarios: speaker-dependent, multi-speaker, speaker-independent, and speaker-adapted. We observe that models underperform when applied to data from speakers not seen at training time. However, when provided with minimal additional speaker information, such as the mean ultrasound frame, the models generalize better to unseen speakers.

View on arXiv PDF

Similar