Towards a generalized monaural and binaural auditory model for psychoacoustics and speech intelligibility
This work addresses the need for a combined auditory model for researchers in psychoacoustics and speech processing, but it is incremental as it builds upon existing monaural models with a simplified binaural stage.
The authors tackled the problem of integrating monaural and binaural auditory cues into a unified model for psychoacoustics and speech intelligibility, resulting in a 5-channel binaural matrix feature decoder that extends an existing monaural model and is evaluated on baseline experiments from the literature.
Auditory perception involves cues in the monaural auditory pathways as well as binaural cues based on differences between the ears. So far auditory models have often focused on either monaural or binaural experiments in isolation. Although binaural models typically build upon stages of (existing) monaural models, only a few attempts have been made to extend a monaural model by a binaural stage using a unified decision stage for monaural and binaural cues. In such approaches, a typical prototype of binaural processing has been the classical equalization-cancelation mechanism, which either involves signal-adaptive delays and provides a single channel output or can be implemented with tapped delays providing a high-dimensional multichannel output. This contribution extends the (monaural) generalized envelope power spectrum model by a non-adaptive binaural stage with only a few, fixed output channels. The binaural stage resembles features of physiologically motivated hemispheric binaural processing, as simplified signal processing stages, yielding a 5-channel monaural and binaural matrix feature "decoder" (BMFD). The back end of the existing monaural model is applied to the 5-channel BMFD output and calculates short-time envelope power and power features. The model is evaluated and discussed for a baseline database of monaural and binaural psychoacoustic experiments from the literature.