Piotr Majdak

SD
8papers
224citations
Novelty34%
AI Score21

8 Papers

NAJun 1, 2016
A-priori mesh grading for the numerical calculation of the head-related transfer functions

Harald Ziegelwanger, Wolfgang Kreuzer, Piotr Majdak

Head-related transfer functions (HRTFs) describe the directional filtering of the incoming sound caused by the morphology of a listener's head and pinnae. When an accurate model of a listener's morphology exists, HRTFs can be calculated numerically with the boundary element method (BEM). However, the general recommendation to model the head and pinnae with at least six elements per wavelength renders the BEM as a time-consuming procedure when calculating HRTFs for the full audible frequency range. In this study, a mesh preprocessing algorithm is proposed, viz., a-priori mesh grading, which reduces the computational costs in the HRTF calculation process significantly. The mesh grading algorithm deliberately violates the recommendation of at least six elements per wavelength in certain regions of the head and pinnae and varies the size of elements gradually according to an a-priori defined grading function. The evaluation of the algorithm involved HRTFs calculated for various geometric objects including meshes of three human listeners and various grading functions. The numerical accuracy and the predicted sound-localization performance of calculated HRTFs were analyzed. A-priori mesh grading appeared to be suitable for the numerical calculation of HRTFs in the full audible frequency range and outperformed uniform meshes in terms of numerical errors, perception based predictions of sound-localization performance, and computational costs.

ASJul 5, 2021
A comparative study of eight human auditory models of monaural processing

Alejandro Osses Vecchi, Léo Varnet, Laurel H. Carney et al.

A number of auditory models have been developed using diverging approaches, either physiological or perceptual, but they share comparable stages of signal processing, as they are inspired by the same constitutive parts of the auditory system. We compare eight monaural models that are openly accessible in the Auditory Modelling Toolbox. We discuss the considerations required to make the model outputs comparable to each other, as well as the results for the following model processing stages or their equivalents: Outer and middle ear, cochlear filter bank, inner hair cell, auditory nerve synapse, cochlear nucleus, and inferior colliculus. The discussion includes a list of recommendations for future applications of auditory models.

SPJun 9, 2021
Time-Frequency Phase Retrieval for Audio -- The Effect of Transform Parameters

Andrés Marafioti, Nicki Holighaus, Piotr Majdak

In audio processing applications, phase retrieval (PR) is often performed from the magnitude of short-time Fourier transform (STFT) coefficients. Although PR performance has been observed to depend on the considered STFT parameters and audio data, the extent of this dependence has not been systematically evaluated yet. To address this, we studied the performance of three PR algorithms for various types of audio content and various STFT parameters such as redundancy, time-frequency ratio, and the type of window. The quality of PR was studied in terms of objective difference grade and signal-to-noise ratio of the STFT magnitude, to provide auditory- and signal-based quality assessments. Our results show that PR quality improved with increasing redundancy, with a strong relevance of the time-frequency ratio. The effect of the audio content was smaller but still observable. The effect of the window was only significant for one of the PR algorithms. Interestingly, for a good PR quality, each of the three algorithms required a different set of parameters, demonstrating the relevance of individual parameter sets for a fair comparison across PR algorithms. Based on these results, we developed guidelines for optimizing STFT parameters for a given application.

SDMay 11, 2020
GACELA -- A generative adversarial context encoder for long audio inpainting

Andres Marafioti, Piotr Majdak, Nicki Holighaus et al.

We introduce GACELA, a generative adversarial network (GAN) designed to restore missing musical audio data with a duration ranging between hundreds of milliseconds to a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal parts, GACELA addresses the inpainting of long gaps in two aspects. First, it considers various time scales of audio information by relying on five parallel discriminators with increasing resolution of receptive fields. Second, it is conditioned not only on the available information surrounding the gap, i.e., the context, but also on the latent variable of the conditional GAN. This addresses the inherent multi-modality of audio inpainting at such long gaps and provides the option of user-defined inpainting. GACELA was tested in listening tests on music signals of varying complexity and gap durations ranging from 375~ms to 1500~ms. While our subjects were often able to detect the inpaintings, the severity of the artifacts decreased from unacceptable to mildly disturbing. GACELA represents a framework capable to integrate future improvements such as processing of more auditory-related features or more explicit musical features.

SDFeb 11, 2019
Adversarial Generation of Time-Frequency Features with application in audio synthesis

Andrés Marafioti, Nicki Holighaus, Nathanaël Perraudin et al.

Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from neurally generated invertible TF features still struggle to produce audio at satisfying quality. In this article, focusing on the short-time Fourier transform, we discuss the challenges that arise in audio synthesis based on generated invertible TF features and how to overcome them. We demonstrate the potential of deliberate generative TF modeling by training a generative adversarial network (GAN) on short-time Fourier features. We show that by applying our guidelines, our TF-based network was able to outperform a state-of-the-art GAN generating waveforms directly, despite the similar architecture in the two networks.

SDOct 29, 2018
Audio inpainting of music by means of neural networks

Andrés Marafioti, Nicki Holighaus, Piotr Majdak et al.

We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps. The input to the DNN was the context, i.e., the signal surrounding the gap, transformed into time-frequency (TF) coefficients. Our results were compared to those obtained from a reference method based on linear predictive coding (LPC). For music, our DNN significantly outperformed the reference method, demonstrating a generally good usability of the proposed DNN structure for inpainting complex audio signals like music.

SDJul 22, 2016
Inpainting of long audio segments with similarity graphs

Nathanael Perraudin, Nicki Holighaus, Piotr Majdak et al.

We present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive optimization scheme and smoothly inserted into the gap, i.e. the lost or distorted signal region. Extensive listening tests show that the proposed algorithm provides highly promising results when applied to a variety of real-world music signals.

SDJun 11, 2015
Channel Interaction and Current Level Affect Across-Electrode Integration of Interaural Time Differences in Bilateral Cochlear-Implant Listeners

Katharina Egger, Piotr Majdak, Bernhard Laback

Sensitivity to ITDs is important for sound localization. Normal-hearing listeners benefit from across-frequency processing, as seen with improved ITD thresholds when consistent ITD cues are presented over a range of frequency channels compared to when ITD information is only presented in a single frequency channel. This study aimed to clarify whether cochlear-implant (CI) listeners can make use of similar processing when being stimulated with multiple interaural electrode pairs transmitting consistent ITD information. ITD thresholds for unmodulated, 100-pulse-per-second pulse trains were measured in seven bilateral CI listeners using research interfaces. Consistent ITDs were presented at either one or two electrode pairs at different current levels, allowing for comparisons at either constant level per component electrode or equal overall loudness. Different tonotopic distances between the pairs were tested in order to clarify the potential influence of channel interaction. Comparison of ITD thresholds between double pairs and the respective single pairs revealed systematic effects of tonotopic separation and current level. At constant levels, performance with double-pair stimulation improved compared to single-pair stimulation, but only for large tonotopic separation. Comparisons at equal overall loudness revealed no benefit from presenting ITD information at two electrode pairs for any tonotopic spacing. Irrespective of electrode-pair configuration, ITD sensitivity improved with increasing current level. Hence, the improved ITD sensitivity for double pairs found for a large tonotopic separation and constant current levels seems to be due to increased loudness. The overall data suggest that CI listeners can benefit from combining consistent ITD information across multiple electrodes, provided sufficient stimulus levels and that stimulating electrode pairs are widely spaced.