SDAIASJun 15, 2023

Unsupervised speech intelligibility assessment with utterance level alignment distance between teacher and learner Wav2Vec-2.0 representations

arXiv:2306.08845v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the need for scalable, cost-effective speech intelligibility assessment in computer-assisted language learning systems, though it is incremental as it builds on existing self-supervised representations.

The paper tackles the problem of automatic speech intelligibility detection for language learning by proposing an unsupervised approach using alignment distance between teacher and learner Wav2Vec-2.0 representations, achieving detection accuracies of 90.37%, 92.57%, and 96.58% with different distance measures.

Speech intelligibility is crucial in language learning for effective communication. Thus, to develop computer-assisted language learning systems, automatic speech intelligibility detection (SID) is necessary. Most of the works have assessed the intelligibility in a supervised manner considering manual annotations, which requires cost and time; hence scalability is limited. To overcome these, this work proposes an unsupervised approach for SID. The proposed approach considers alignment distance computed with dynamic-time warping (DTW) between teacher and learner representation sequence as a measure to separate intelligible versus non-intelligible speech. We obtain the feature sequence using current state-of-the-art self-supervised representations from Wav2Vec-2.0. We found the detection accuracies as 90.37\%, 92.57\% and 96.58\%, respectively, with three alignment distance measures -- mean absolute error, mean squared error and cosine distance (equal to one minus cosine similarity).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes