LGAug 30, 2024
Point Neuron Learning: A New Physics-Informed Neural Network ArchitectureHanwen Bi, Thushara D. Abhayapala
Machine learning and neural networks have advanced numerous research domains, but challenges such as large training data requirements and inconsistent model performance hinder their application in certain scientific problems. To overcome these challenges, researchers have investigated integrating physics principles into machine learning models, mainly through: (i) physics-guided loss functions, generally termed as physics-informed neural networks, and (ii) physics-guided architectural design. While both approaches have demonstrated success across multiple scientific disciplines, they have limitations including being trapped to a local minimum, poor interpretability, and restricted generalizability. This paper proposes a new physics-informed neural network (PINN) architecture that combines the strengths of both approaches by embedding the fundamental solution of the wave equation into the network architecture, enabling the learned model to strictly satisfy the wave equation. The proposed point neuron learning method can model an arbitrary sound field based on microphone observations without any dataset. Compared to other PINN methods, our approach directly processes complex numbers and offers better interpretability and generalizability. We evaluate the versatility of the proposed architecture by a sound field reconstruction problem in a reverberant environment. Results indicate that the point neuron method outperforms two competing methods and can efficiently handle noisy environments with sparse microphone observations.
ASFeb 2
RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse ResponsesShaoheng Xu, Chunyi Sun, Jihui Zhang et al.
Room impulse responses (RIRs) are essential for many acoustic signal processing tasks, yet measuring them densely across space is often impractical. In this work, we propose RIR-Former, a grid-free, one-step feed-forward model for RIR reconstruction. By introducing a sinusoidal encoding module into a transformer backbone, our method effectively incorporates microphone position information, enabling interpolation at arbitrary array locations. Furthermore, a segmented multi-branch decoder is designed to separately handle early reflections and late reverberation, improving reconstruction across the entire RIR. Experiments on diverse simulated acoustic environments demonstrate that RIR-Former consistently outperforms state-of-the-art baselines in terms of normalized mean square error (NMSE) and cosine distance (CD), under varying missing rates and array configurations. These results highlight the potential of our approach for practical deployment and motivate future work on scaling from randomly spaced linear arrays to complex array geometries, dynamic acoustic scenes, and real-world environments.
57.4ASMar 30
BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded TransformerShaoheng Xu, Chunyi Sun, Jihui Zhang et al.
Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose BiFormer3D, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase pre-processing is unnecessary.
ASJan 16, 2019
Real-time separation of non-stationary sound fields on spheresFei Ma, Wen Zhang, Thushara D. Abhayapala
The sound field separation methods can separate the target field from the interfering noises, facilitating the study of the acoustic characteristics of the target source, which is placed in a noisy environment. However, most of the existing sound field separation methods are derived in the frequency-domain, thus are best suited for separating stationary sound fields. In this paper, a time-domain sound field separation method is developed that can separate the non-stationary sound field generated by the target source over a sphere in real-time. A spherical array sets up a boundary between the target source and the interfering sources, such that the outgoing field on the array is only generated by the target source. The proposed method decomposes the pressure and the radial particle velocity measured by the array into spherical harmonics coefficients, and recoveries the target outgoing field based on the time-domain relationship between the decomposition coefficients and the theoretically derived spatial filter responses. Simulations show the proposed method can separate non-stationary sound fields both in free field and room environments, and over a longer duration with small errors. The proposed method could serve as a foundation for developing future time-domain spatial sound field manipulation algorithms.
SDMay 16, 2018
PSD Estimation and Source Separation in a Noisy Reverberant Environment using a Spherical Microphone ArrayAbdullah Fahim, Prasanga N. Samarasinghe, Thushara D. Abhayapala
In this paper, we propose an efficient technique for estimating individual power spectral density (PSD) components, i.e., PSD of each desired sound source as well as of noise and reverberation, in a multi-source reverberant sound scene with coherent background noise. We formulate the problem in the spherical harmonics domain to take the advantage of the inherent orthogonality of the spherical harmonics basis functions and extract the PSD components from the cross-correlation between the different sound field modes. We also investigate an implementation issue that occurs at the nulls of the Bessel functions and offer an engineering solution. The performance evaluation takes place in a practical environment with a commercial microphone array in order to measure the robustness of the proposed algorithm against all the deviations incurred in practice. We also exhibit an application of the proposed PSD estimator through a source septation algorithm and compare the performance with a contemporary method in terms of different objective measures.
SDMar 1, 2018
Mode Domain Spatial Active Noise Control Using Sparse Signal RepresentationYu Maeno, Yuki Mitsufuji, Thushara D. Abhayapala
Active noise control (ANC) over a sizeable space requires a large number of reference and error microphones to satisfy the spatial Nyquist sampling criterion, which limits the feasibility of practical realization of such systems. This paper proposes a mode-domain feedforward ANC method to attenuate the noise field over a large space while reducing the number of microphones required. We adopt a sparse reference signal representation to precisely calculate the reference mode coefficients. The proposed system consists of circular reference and error microphone arrays, which capture the reference noise signal and residual error signal, respectively, and a circular loudspeaker array to drive the anti-noise signal. Experimental results indicate that above the spatial Nyquist frequency,our proposed method can perform well compared to a conventional methods. Moreover, the proposed method can even reduce the number of reference microphones while achieving better noise attenuation.
SDSep 5, 2017
PSD Estimation of Multiple Sound Sources in a Reverberant Room Using a Spherical Microphone ArrayAbdullah Fahim, Prasanga N. Samarasinghe, Thushara D. Abhayapala
We propose an efficient method to estimate source power spectral densities (PSDs) in a multi-source reverberant environment using a spherical microphone array. The proposed method utilizes the spatial correlation between the spherical harmonics (SH) coefficients of a sound field to estimate source PSDs. The use of the spatial cross-correlation of the SH coefficients allows us to employ the method in an environment with a higher number of sources compared to conventional methods. Furthermore, the orthogonality property of the SH basis functions saves the effort of designing specific beampatterns of a conventional beamformer-based method. We evaluate the performance of the algorithm with different number of sources in practical reverberant and non-reverberant rooms. We also demonstrate an application of the method by separating source signals using a conventional beamformer and a Wiener post-filter designed from the estimated PSDs.
SDOct 30, 2015
Estimation of the direct-to-reverberant Energy Ratio using a spherical microphone arrayHanchi Chen, Prasanga N. Samarasinghe, Thushara D. Abhayapala et al.
This paper proposes a practical approach to estimate the direct-to-reverberant energy ratio (DRR) using a spherical microphone array without having knowledge of the source signal. We base our estimation on a theoretical relationship between the DRR and the coherence estimation function between coincident pressure and particle velocity. We discuss the proposed method's ability to estimate the DRR in a wide variety of room sizes, reverberation times and source receiver distances with appropriate examples. Test results show that the method can estimate the room DRR for frequencies between 199 - 2511 Hz, with $\pm$ 3 dB accuracy.