SDLGASJan 24, 2024

Speech foundation models on intelligibility prediction for hearing-impaired listeners

arXiv:2401.14289v120 citationsICASSP
Originality Synthesis-oriented
AI Analysis

This work addresses speech perception for hearing-impaired individuals, but it is incremental as it applies existing SFMs to a new domain with a simple adaptation.

The paper tackled the problem of predicting speech intelligibility for hearing-impaired listeners by evaluating 10 speech foundation models (SFMs) on the Clarity Prediction Challenge 2, and their method achieved the winning submission in the challenge.

Speech foundation models (SFMs) have been benchmarked on many speech processing tasks, often achieving state-of-the-art performance with minimal adaptation. However, the SFM paradigm has been significantly less explored for applications of interest to the speech perception community. In this paper we present a systematic evaluation of 10 SFMs on one such application: Speech intelligibility prediction. We focus on the non-intrusive setup of the Clarity Prediction Challenge 2 (CPC2), where the task is to predict the percentage of words correctly perceived by hearing-impaired listeners from speech-in-noise recordings. We propose a simple method that learns a lightweight specialized prediction head on top of frozen SFMs to approach the problem. Our results reveal statistically significant differences in performance across SFMs. Our method resulted in the winning submission in the CPC2, demonstrating its promise for speech perception applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes