SDCLASOct 11, 2023

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

CMU
arXiv:2310.07161v31 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This addresses speech enhancement challenges for VoIP users, but it is incremental as it applies existing tools to a specific domain.

The study analyzed how proprietary sender-side denoising on VoIP platforms like Google Meets and Zoom affects speech quality, using the DNS 2020 dataset and reporting metrics such as PESQ and STOI to quantify perceptual impacts.

Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured examination tailored to various denoising settings and receiver interfaces. A methodological novelty is introduced via Blinder-Oaxaca decomposition, traditionally an econometric tool, repurposed herein to analyze acoustic-phonetic perturbations within VoIP systems. To further ground the implications of these transformations, psychoacoustic metrics, specifically PESQ and STOI, were used to explain of perceptual quality and intelligibility. Cumulatively, the insights garnered underscore the intricate landscape of VoIP-influenced acoustic dynamics. In addition to the primary findings, a multitude of metrics are reported, extending the research purview. Moreover, out-of-domain benchmarking for both time and time-frequency domain speech enhancement models is included, thereby enhancing the depth and applicability of this inquiry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes