Toon van Waterschoot

AS
7papers
89citations
Novelty38%
AI Score42

7 Papers

14.5ASApr 18
A state-space representation of the boundary integral equation for room acoustic modelling

Randall Ali, Thomas Dietzen, Matteo Scerbo et al.

We introduce a new framework for room acoustics modelling based on a state-space model of the boundary integral equation representing the sound field in a room. Whereas state-space models of linear time-invariant systems are traditionally constructed by means of a state vector and a 4-tuple of system matrices, the state-space representation introduced in this work consists of a state function representing the pressure distribution at the room boundary, and a 4-tuple of integral operators. We refer to this representation as a boundary integral operator state-space (BIOSS) model and provide a physical interpretation for each of the integral operators. As many mathematical operations on vectors and matrices translate to functions and operators, the BIOSS representation can be manipulated to obtain two transfer function representations, having either a feedback or a parallel feedforward structure. Consequently, various equivalent representations for room acoustics are obtained in the BIOSS framework, in the time or frequency domain, and in continuous or discrete space. We discuss two future directions for how the proposed framework can be fertile for research on room acoustics modelling. Firstly, we identify equivalences between the BIOSS framework and various existing room acoustics models (boundary element models, delay networks, geometric models), which may be used to establish relations between existing models and to develop novel room acoustics models. Secondly, we postulate on how concepts from state-space theory, such as observability, controllability, and state realization, can be used for developing new inference and control methods for room acoustics.

7.5ASMar 10
Distributed Multichannel Wiener Filtering for Wireless Acoustic Sensor Networks

Paul Didier, Toon van Waterschoot, Simon Doclo et al.

In a wireless acoustic sensor network (WASN), devices (i.e., nodes) can collaborate through distributed algorithms to collectively perform audio signal processing tasks. This paper focuses on the distributed estimation of node-specific desired speech signals using network-wide Wiener filtering. The objective is to match the performance of a centralized system that would have access to all microphone signals, while reducing the communication bandwidth usage of the algorithm. Existing solutions, such as the distributed adaptive node-specific signal estimation (DANSE) algorithm, converge towards the multichannel Wiener filter (MWF) which solves a centralized linear minimum mean square error (LMMSE) signal estimation problem. However, they do so iteratively, which can be slow and impractical. Many solutions also assume that all nodes observe the same set of sources of interest, which is often not the case in practice. To overcome these limitations, we propose the distributed multichannel Wiener filter (dMWF) for fully connected WASNs. The dMWF is non-iterative and optimal even when nodes observe different sets of sources. In this algorithm, nodes exchange neighbor-pair-specific, low-dimensional (fused) signals estimating the contribution of sources observed by both nodes in the pair. We formally prove the optimality of dMWF and demonstrate its performance in simulated speech enhancement experiments. The proposed algorithm is shown to outperform DANSE in terms of objective metrics after short operation times, highlighting the benefit of its iterationless design.

ASJul 1, 2020Code
Instantaneous PSD Estimation for Speech Enhancement based on Generalized Principal Components

Thomas Dietzen, Marc Moonen, Toon van Waterschoot

Power spectral density (PSD) estimates of various microphone signal components are essential to many speech enhancement procedures. As speech is highly non-nonstationary, performance improvements may be gained by maintaining time-variations in PSD estimates. In this paper, we propose an instantaneous PSD estimation approach based on generalized principal components. Similarly to other eigenspace-based PSD estimation approaches, we rely on recursive averaging in order to obtain a microphone signal correlation matrix estimate to be decomposed. However, instead of estimating the PSDs directly from the temporally smooth generalized eigenvalues of this matrix, yielding temporally smooth PSD estimates, we propose to estimate the PSDs from newly defined instantaneous generalized eigenvalues, yielding instantaneous PSD estimates. The instantaneous generalized eigenvalues are defined from the generalized principal components, i.e. a generalized eigenvector-based transform of the microphone signals. We further show that the smooth generalized eigenvalues can be understood as a recursive average of the instantaneous generalized eigenvalues. Simulation results comparing the multi-channel Wiener filter (MWF) with smooth and instantaneous PSD estimates indicate better speech enhancement performance for the latter. A MATLAB implementation is available online.

ASDec 17, 2020
Low-Complexity Steered Response Power Mapping based on Nyquist-Shannon Sampling

Thomas Dietzen, Enzo De Sena, Toon van Waterschoot

The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are bandlimited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. MATLAB and Python implementations are available online.

ASDec 17, 2018
A multi-layered energy consumption model for smart wireless acoustic sensor networks

Gert Dekkers, Fernando Rosas, Steven Lauwereins et al.

Smart sensing is expected to become a pervasive technology in smart cities and environments of the near future. These services are improving their capabilities due to integrated devices shrinking in size while maintaining their computational power, which can run diverse Machine Learning algorithms and achieve high performance in various data-processing tasks. One attractive sensor modality to be used for smart sensing are acoustic sensors, which can convey highly informative data while keeping a moderate energy consumption. Unfortunately, the energy budget of current wireless sensor networks is usually not enough to support the requirements of standard microphones. Therefore, energy efficiency needs to be increased at all layers --- sensing, signal processing and communication --- in order to bring wireless smart acoustic sensors into the market. To help to attain this goal, this paper introduces WASN-EM: an energy consumption model for wireless acoustic sensors networks (WASN), whose aim is to aid in the development of novel techniques to increase the energy-efficient of smart wireless acoustic sensors. This model provides a first step of exploration prior to custom design of a smart wireless acoustic sensor, and also can be used to compare the energy consumption of different protocols.

ASJul 30, 2018
DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics

Gert Dekkers, Lode Vuegen, Toon van Waterschoot et al.

The DCASE 2018 Challenge consists of five tasks related to automatic classification and detection of sound events and scenes. This paper presents the setup of Task 5 which includes the description of the task, dataset and the baseline system. In this task, it is investigated to which extent multi-channel acoustic recordings are beneficial for the purpose of classifying domestic activities. The goal is to exploit spectral and spatial cues independent of sensor location using multi-channel audio. For this purpose we provided a development and evaluation dataset which are derivatives of the SINS database and contain domestic activities recorded by multiple microphone arrays. The baseline system, based on a Neural Network architecture using convolutional and dense layer(s), is intended to lower the hurdle to participate the challenge and to provide a reference performance.

SDOct 15, 2015
Evaluating the Non-Intrusive Room Acoustics Algorithm with the ACE Challenge

Pablo Peso Parada, Dushyant Sharma, Toon van Waterschoot et al.

We present a single channel data driven method for non-intrusive estimation of full-band reverberation time and full-band direct-to-reverberant ratio. The method extracts a number of features from reverberant speech and builds a model using a recurrent neural network to estimate the reverberant acoustic parameters. We explore three configurations by including different data and also by combining the recurrent neural network estimates using a support vector machine. Our best method to estimate DRR provides a Root Mean Square Deviation (RMSD) of 3.84 dB and a RMSD of 43.19 % for T60 estimation.