ASOct 6, 2021
Lower Interaural Coherence in Off-Signal Bands Impairs Binaural DetectionBernhard Eurich, Jörg Encke, Stephan D. Ewert et al.
Differences in interaural phase configuration between a target and a masker can lead to substantial binaural unmasking. This effect is decreased for masking noises with an interaural time difference (ITD). Adding a second noise with an opposing ITD in most cases further reduces binaural unmasking. Thus far, modeling of these detection thresholds required both a mechanism for internal ITD compensation and an increased binaural bandwidth. An alternative explanation for the reduction is that unmasking is impaired by the lower interaural coherence in off-frequency regions caused by the second masker (Marquardt & McAlpine, 2009, JASA pp. EL177 - EL182). Based on this hypothesis, the current work proposes a quantitative multi-channel model using monaurally derived peripheral filter bandwidths and an across-channel incoherence interference mechanism. This mechanism differs from wider filters since it has no effect when the masker coherence is constant across frequency bands. Combined with a monaural energy discrimination pathway, the model predicts the differences between a single delayed noise and two opposingly delayed noises, as well as four other data sets. It helps resolve the inconsistency explaining some data sets requires wide filters while others require narrow filters.
ASJul 1, 2021
Prediction of tone detection thresholds in interaurally delayed noise based on interaural phase difference fluctuationsMathias Dietz, Jörg Encke, Kristin I. Bracklo et al.
Differences between the interaural phase of a noise and a target tone improve detection thresholds. The maximum masking release is obtained for detecting an antiphasic tone (S$π$) in diotic noise (N0). It has been shown in several studies that this benefit gradually declines as an interaural delay is applied to the N0S$π$ complex. This decline has been attributed to the reduced interaural coherence of the noise. Here, we report detection thresholds for a 500 Hz tone in masking noise with up to 8 ms interaural delay and bandwidths from 25 to 1000 Hz. When reducing the noise bandwidth from 100 to 50 and 25 Hz, the masking release at 8 ms delay increases, as expected for increasing temporal coherence with decreasing bandwidth. For bandwidths of 100 to 1000 Hz, no significant difference was observed and detection thresholds with these noises have a delay dependence that is fully described by the temporal coherence imposed by the typical monaurally determined auditory filter bandwidth. A minimalistic binaural model is suggested based on interaural phase difference fluctuations without the assumption of delay lines.
SDJun 30, 2021
Communication conditions in virtual acoustic scenes in an underground stationĽuboš Hládek, Stephan D. Ewert, Bernhard U. Seeber
Underground stations are a common communication situation in towns: we talk with friends or colleagues, listen to announcements or shop for titbits while background noise and reverberation are challenging communication. Here, we perform an acoustical analysis of two communication scenes in an underground station in Munich and test speech intelligibility. The acoustical conditions were measured in the station and are compared to simulations in the real-time Simulated Open Field Environment (rtSOFE). We compare binaural room impulse responses measured with an artificial head in the station to modeled impulse responses for free-field auralization via 60 loudspeakers in the rtSOFE. We used the image source method to model early reflections and a set of multi-microphone recordings to model late reverberation. The first communication scene consists of 12 equidistant (1.6 m) horizontally spaced source positions around a listener, simulating different direction-dependent spatial unmasking conditions. The second scene mimics an approaching speaker across six radially spaced source positions (from 1 m to 10 m) with varying direct sound level and thus direct-to-reverberant energy. The acoustic parameters of the underground station show a moderate amount of reverberation (T30 in octave bands was between 2.3 s and 0.6 s and early-decay times between 1.46 s and 0.46 s). The binaural and energetic parameters of the auralization were in a close match to the measurement. Measured speech reception thresholds were within the error of the speech test, letting us to conclude that the auralized simulation reproduces acoustic and perceptually relevant parameters for speech intelligibility with high accuracy.
ASJun 30, 2021
Effect of acoustic scene complexity and visual scene representation on auditory perception in virtual audio-visual environmentsStefan Fichna, Thomas Biberger, Bernhard U. Seeber et al.
In daily life, social interaction and acoustic communication often take place in complex acoustic environments (CAE) with a variety of interfering sounds and reverberation. For hearing research and the evaluation of hearing systems, simulated CAEs using virtual reality techniques have gained interest in the context of ecological validity. In the current study, the effect of scene complexity and visual representation of the scene on psychoacoustic measures like sound source location, distance perception, loudness, speech intelligibility, and listening effort in a virtual audio-visual environment was investigated. A 3-dimensional, 86-channel loudspeaker array was used to render the sound field in combination with or without a head-mounted display (HMD) to create an immersive stereoscopic visual representation of the scene. The scene consisted of a ring of eight (virtual) loudspeakers which played a target speech stimulus and nonsense speech interferers in several spatial conditions. Either an anechoic (snowy outdoor scenery) or echoic environment (loft apartment) with a reverberation time (T60) of about 1.5 s was simulated. In addition to varying the number of interferers, scene complexity was varied by assessing the psychoacoustic measures in isolated consecutive measurements orcsimultaneously. Results showed no significant effect of wearing the HMD on the data. Loudness and distance perception showed significantly different results when they were measured simultaneously instead of consecutively in isolation. The advantage of the suggested setup is that it can be directly transferred to a corresponding real room, enabling a 1:1 comparison and verification of the perception experiments in the real and virtual environment.
ASJun 30, 2021
Computationally efficient spatial rendering of late reverberation in virtual acoustic environmentsChristoph Kirsch, Josef Poppitz, Torben Wendt et al.
For 6-DOF (degrees of freedom) interactive virtual acoustic environments (VAEs), the spatial rendering of diffuse late reverberation in addition to early (specular) reflections is important. In the interest of computational efficiency, the acoustic simulation of the late reverberation can be simplified by using a limited number of spatially distributed virtual reverb sources (VRS) each radiating incoherent signals. A sufficient number of VRS is needed to approximate spatially anisotropic late reverberation, e.g., in a room with inhomogeneous distribution of absorption at the boundaries. Here, a highly efficient and perceptually plausible method to generate and spatially render late reverberation is suggested, extending the room acoustics simulator RAZR [Wendt et al., J. Audio Eng. Soc., 62, 11 (2014)]. The room dimensions and frequency-dependent absorption coefficients at the wall boundaries are used to determine the parameters of a physically-based feedback delay network (FDN) to generate the incoherent VRS signals. The VRS are spatially distributed around the listener with weighting factors representing the spatially subsampled distribution of absorption coefficients on the wall boundaries. The minimum number of VRS required to be perceptually distinguishable from the maximum (reference) number of 96 VRS was assessed in a listening test conducted with a spherical loudspeaker array within an anechoic room. For the resulting low numbers of VRS suited for spatial rendering, optimal physically-based parameter choices for the FDN are discussed.
ASJun 30, 2021
Spatial resolution of late reverberation in virtual acoustic environmentsChristoph Kirsch, Josef Poppitz, Torben Wendt et al.
Late reverberation involves the superposition of many sound reflections resulting in a diffuse sound field. Since the spatially resolved perception of individual diffuse reflections is impossible, simplifications can potentially be made for modelling late reverberation in room acoustics simulations with reduced spatial resolution. Such simplifications are desired for interactive, real-time virtual acoustic environments with applications in hearing research and for the evaluation of hearing supportive devices. In this context, the number and spatial arrangement of loudspeakers used for playback additionally affect spatial resolution. The current study assessed the minimum number of spatially evenly distributed virtual late reverberation sources required to perceptually approximate spatially highly resolved isotropic and anisotropic late reverberation and to technically approximate a spherically isotropic diffuse sound field. The spatial resolution of the rendering was systematically reduced by using subsets of the loudspeakers of an 86-channel spherical loudspeaker array in an anechoic chamber. It was tested whether listeners can distinguish lower spatial resolutions for the rendering of late reverberation from the highest achievable spatial resolution in different simulated rooms. Rendering of early reflections was kept fixed. The coherence of the sound field across a pair of microphones at ear and behind-the-ear hearing device distance was assessed to separate the effects of number of virtual sources and loudspeaker array geometry. Results show that between 12 and 24 reverberation sources are required.
ASJun 29, 2021
Towards a generalized monaural and binaural auditory model for psychoacoustics and speech intelligibilityThomas Biberger, Stephan D. Ewert
Auditory perception involves cues in the monaural auditory pathways as well as binaural cues based on differences between the ears. So far auditory models have often focused on either monaural or binaural experiments in isolation. Although binaural models typically build upon stages of (existing) monaural models, only a few attempts have been made to extend a monaural model by a binaural stage using a unified decision stage for monaural and binaural cues. In such approaches, a typical prototype of binaural processing has been the classical equalization-cancelation mechanism, which either involves signal-adaptive delays and provides a single channel output or can be implemented with tapped delays providing a high-dimensional multichannel output. This contribution extends the (monaural) generalized envelope power spectrum model by a non-adaptive binaural stage with only a few, fixed output channels. The binaural stage resembles features of physiologically motivated hemispheric binaural processing, as simplified signal processing stages, yielding a 5-channel monaural and binaural matrix feature "decoder" (BMFD). The back end of the existing monaural model is applied to the 5-channel BMFD output and calculates short-time envelope power and power features. The model is evaluated and discussed for a baseline database of monaural and binaural psychoacoustic experiments from the literature.