ASJun 20, 2023
Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responsesMirco Pezzoli, Fabio Antonacci, Augusto Sarti
Recently deep learning and machine learning approaches have been widely employed for various applications in acoustics. Nonetheless, in the area of sound field processing and reconstruction classic methods based on the solutions of wave equation are still widespread. Recently, physics-informed neural networks have been proposed as a deep learning paradigm for solving partial differential equations which govern physical phenomena, bridging the gap between purely data-driven and model based methods. Here, we exploit physics-informed neural networks to reconstruct the early part of missing room impulse responses in an uniform linear array. This methodology allows us to exploit the underlying law of acoustics, i.e., the wave equation, forcing the neural network to generate physically meaningful solutions given only a limited number of data points. The results on real measurements show that the proposed model achieves accurate reconstruction and performance in line with respect to state-of-the-art deep-learning and compress sensing techniques while maintaining a lightweight architecture.
ASJul 26, 2024
A Physics-Informed Neural Network-Based Approach for the Spatial Upsampling of Spherical Microphone ArraysFederico Miotello, Ferdinando Terminiello, Mirco Pezzoli et al.
Spherical microphone arrays are convenient tools for capturing the spatial characteristics of a sound field. However, achieving superior spatial resolution requires arrays with numerous capsules, consequently leading to expensive devices. To address this issue, we present a method for spatially upsampling spherical microphone arrays with a limited number of capsules. Our approach exploits a physics-informed neural network with Rowdy activation functions, leveraging physical constraints to provide high-order microphone array signals, starting from low-order devices. Results show that, within its domain of application, our approach outperforms a state of the art method based on signal processing for spherical microphone arrays upsampling.
ASDec 14, 2023
Reconstruction of Sound Field through Diffusion ModelsFederico Miotello, Luca Comanducci, Mirco Pezzoli et al.
Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain. The architecture is devised in order to be conditioned on a set of limited available measurements at different frequencies and generate the sound field in target, unknown, locations. The results show that SF-Diff is able to provide accurate reconstructions, outperforming a state-of-the-art baseline based on kernel interpolation.
ASDec 15, 2023
Toward Deep Drum Source SeparationAlessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini et al.
In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.
ASFeb 1, 2024
Room Transfer Function Reconstruction Using Complex-valued Neural Networks and Irregularly Distributed MicrophonesFrancesca Ronchini, Luca Comanducci, Mirco Pezzoli et al.
Reconstructing the room transfer functions needed to calculate the complex sound field in a room has several important real-world applications. However, an unpractical number of microphones is often required. Recently, in addition to classical signal processing methods, deep learning techniques have been applied to reconstruct the room transfer function starting from a very limited set of measurements at scattered points in the room. In this paper, we employ complex-valued neural networks to estimate room transfer functions in the frequency range of the first room resonances, using a few irregularly distributed microphones. To the best of our knowledge, this is the first time that complex-valued neural networks are used to estimate room transfer functions. To analyze the benefits of applying complex-valued optimization to the considered task, we compare the proposed technique with a state-of-the-art kernel-based signal processing approach for sound field reconstruction, showing that the proposed technique exhibits relevant advantages in terms of phase accuracy and overall quality of the reconstructed sound field. For informative purposes, we also compare the model with a similarly-structured data-driven approach that, however, applies a real-valued neural network to reconstruct only the magnitude of the sound field.
SDMar 31, 2021
Near field Acoustic Holography on arbitrary shapes using Convolutional Neural NetworkMarco Olivieri, Mirco Pezzoli, Fabio Antonacci et al.
Near-field Acoustic Holography (NAH) is a well-known problem aimed at estimating the vibrational velocity field of a structure by means of acoustic measurements. In this paper, we propose a NAH technique based on Convolutional Neural Network (CNN). The devised CNN predicts the vibrational field on the surface of arbitrary shaped plates (violin plates) with orthotropic material properties from a limited number of measurements. In particular, the architecture, named Super Resolution CNN (SRCNN), is able to estimate the vibrational field with a higher spatial resolution compared to the input pressure. The pressure and velocity datasets have been generated through Finite Element Method simulations. We validate the proposed method by comparing the estimates with the synthesized ground truth and with a state-of-the-art technique. Moreover, we evaluate the robustness of the devised network against noisy input data.
SDFeb 14, 2021
Parametric Optimization of Violin Top Plates using Machine LearningDavide Salvi, Sebastian Gonzalez, Fabio Antonacci et al.
We recently developed a neural network that receives as input the geometrical and mechanical parameters that define a violin top plate and gives as output its first ten eigenfrequencies computed in free boundary conditions. In this manuscript, we use the network to optimize several error functions, with the goal of analyzing the relationship between the eigenspectrum problem for violin top plates and their geometry. First, we focus on the violin outline. Given a vibratory feature, we find which is the best geometry of the plate to obtain it. Second, we investigate whether, from the vibrational point of view, a change in the outline shape can be compensated by one in the thickness distribution and vice versa. Finally, we analyze how to modify the violin shape to keep its response constant as its material properties vary. This is an original technique in musical acoustics, where artificial intelligence is not widely used yet. It allows us to both compute the vibrational behavior of an instrument from its geometry and optimize its shape for a given response. Furthermore, this method can be of great help to violin makers, who can thus easily understand the effects of the geometry changes in the violins they build, shedding light on one of the most relevant and, at the same time, less understood aspects of the construction process of musical instruments.
CEFeb 3, 2021
A Data-Driven Approach to Violin MakingSebastian Gonzalez, Davide Salvi, Daniel Baeza et al.
Of all the characteristics of a violin, those that concern its shape are probably the most important ones, as the violin maker has complete control over them. Contemporary violin making, however, is still based more on tradition than understanding, and a definitive scientific study of the specific relations that exist between shape and vibrational properties is yet to come and sorely missed. In this article, using standard statistical learning tools, we show that the modal frequencies of violin tops can, in fact, be predicted from geometric parameters, and that artificial intelligence can be successfully applied to traditional violin making. We also study how modal frequencies vary with the thicknesses of the plate (a process often referred to as {\em plate tuning}) and discuss the complexity of this dependency. Finally, we propose a predictive tool for plate tuning, which takes into account material and geometric parameters.
ASApr 30, 2020
Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics MatchingAlessandro Ilic Mezza, Emanuël A. P. Habets, Meinard Müller et al.
The performance of machine learning algorithms is known to be negatively affected by possible mismatches between training (source) and test (target) data distributions. In fact, this problem emerges whenever an acoustic scene classification system which has been trained on data recorded by a given device is applied to samples acquired under different acoustic conditions or captured by mismatched recording devices. To address this issue, we propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset. This model-agnostic approach is devised to adapt audio samples from unseen devices before they are fed to a pre-trained classifier, thus avoiding any further learning phase. Using the DCASE 2018 Task 1-B development dataset, we show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
ASFeb 3, 2020
Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural NetworksLuca Comanducci, Maximo Cobos, Fabio Antonacci et al.
The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years. Time delay estimation (TDE) in adverse scenarios is a challenging problem, where classical approaches based on generalized cross-correlations (GCCs) have been widely used for decades. Recently, the frequency-sliding GCC (FS-GCC) was proposed as a novel technique for TDE based on a sub-band analysis of the cross-power spectrum phase, providing a structured two-dimensional representation of the time delay information contained across different frequency bands. Inspired by deep-learning-based image denoising solutions, we propose in this paper the use of convolutional neural networks (CNNs) to learn the time-delay patterns contained in FS-GCCs extracted in adverse acoustic conditions. Our experiments confirm that the proposed approach provides excellent TDE performance while being able to generalize to different room and sensor setups.
ITOct 14, 2016
A Geometrical-Statistical approach to outlier removal for TDOA measumentsMarco Compagnoni, Alessia Pini, Antonio Canclini et al.
The curse of outlier measurements in estimation problems is a well known issue in a variety of fields. Therefore, outlier removal procedures, which enables the identification of spurious measurements within a set, have been developed for many different scenarios and applications. In this paper, we propose a statistically motivated outlier removal algorithm for time differences of arrival (TDOAs), or equivalently range differences (RD), acquired at sensor arrays. The method exploits the TDOA-space formalism and works by only knowing relative sensor positions. As the proposed method is completely independent from the application for which measurements are used, it can be reliably used to identify outliers within a set of TDOA/RD measurements in different fields (e.g. acoustic source localization, sensor synchronization, radar, remote sensing, etc.). The proposed outlier removal algorithm is validated by means of synthetic simulations and real experiments.
ITJun 27, 2016
On the Statistical Model of Source Localization based on Range Difference MeasurementsMarco Compagnoni, Roberto Notari, Fabio Antonacci et al.
In this work we study the statistical model of source localization based on Range Difference measurements. We investigate the case of planar localization of a source using a minimal configuration of three non aligned receivers. Our analysis is based on a previous work of the same authors concerning the localization in a noiseless scenario. As the set of feasible measurements is a semialgebraic variety, this investigation makes use of techniques from Algebraic Statistics and Information Geometry.
AGApr 27, 2016
The algebro-geometric study of range mapsMarco Compagnoni, Roberto Notari, Andrea Alessandro Ruggiu et al.
Localizing a radiant source is a widespread problem to many scientific and technological research areas. E.g. localization based on range measurements stays at the core of technologies like radar, sonar and wireless sensors networks. In this manuscript we study in depth the model for source localization based on range measurements obtained from the source signal, from the point of view of algebraic geometry. In the case of three receivers, we find unexpected connections between this problem and the geometry of Kummer's and Cayley's surfaces. Our work gives new insights also on the localization based on range differences.
SDSep 8, 2015
Source localization and denoising: a perspective from the TDOA spaceMarco Compagnoni, Antonio Canclini, Paolo Bestagini et al.
In this manuscript, we formulate the problem of denoising Time Differences of Arrival (TDOAs) in the TDOA space, i.e. the Euclidean space spanned by TDOA measurements. The method consists of pre-processing the TDOAs with the purpose of reducing the measurement noise. The complete set of TDOAs (i.e., TDOAs computed at all microphone pairs) is known to form a redundant set, which lies on a linear subspace in the TDOA space. Noise, however, prevents TDOAs from lying exactly on this subspace. We therefore show that TDOA denoising can be seen as a projection operation that suppresses the component of the noise that is orthogonal to that linear subspace. We then generalize the projection operator also to the cases where the set of TDOAs is incomplete. We analytically show that this operator improves the localization accuracy, and we further confirm that via simulation.
MATH-PHFeb 10, 2014
A comprehensive analysis of the geometry of TDOA maps in localisation problemsMarco Compagnoni, Roberto Notari, Fabio Antonacci et al.
In this manuscript we consider the well-established problem of TDOA-based source localization and propose a comprehensive analysis of its solutions for arbitrary sensor measurements and placements. More specifically, we define the TDOA map from the physical space of source locations to the space of range measurements (TDOAs), in the specific case of three receivers in 2D space. We then study the identifiability of the model, giving a complete analytical characterization of the image of this map and its invertibility. This analysis has been conducted in a completely mathematical fashion, using many different tools which make it valid for every sensor configuration. These results are the first step towards the solution of more general problems involving, for example, a larger number of sensors, uncertainty in their placement, or lack of synchronization.