CLMay 21

In Silico Modeling of the RAMPHO Buffer: Dissociating Informational and Energetic Masking via Phonetic Entropy in Deep Neural Networks

arXiv:2605.2246527.9

Predicted impact top 96% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in speech perception and hearing science, this provides a computational framework to model cognitive masking effects that current speech enhancement systems ignore.

The authors simulated the RAMPHO buffer using phonetic entropy from wav2vec 2.0 to dissociate informational and energetic masking in multi-talker environments. They found a trade-off: removing semantic content reduces informational masking at high SNRs but degrades temporal cues at low SNRs.

The fundamental challenge of listening in multi-talker environments is a cognitive bottleneck, defined by the Ease of Language Understanding (ELU) model as a failure within the RAMPHO episodic buffer. Current deep neural networks for speech enhancement optimize purely for physical acoustics, failing to account for the cognitive penalty of informational masking. Here, we present an in silico simulation of the RAMPHO buffer using the frame-by-frame phonetic entropy of a self-supervised acoustic model (wav2vec 2.0). By contrasting a semantically intact distractor with a phase-decorrelated distractor (the Concentration Shield) across a signal-to-noise ratio (SNR) sweep, we successfully dissociate the cognitive penalty of informational distraction from the physical penalty of energetic decay. The simulation reveals a cognitive-acoustic Pareto optimization problem: destroying a distractor's semantic payload provides a release from informational masking at high SNRs, but fundamentally degrades temporal glimpsing cues at low SNRs.

View on arXiv PDF

Similar