ASMar 13, 2023
Blind Acoustic Room Parameter Estimation Using Phase FeaturesChristopher Ick, Adib Mehrabi, Wenyu Jin
Modeling room acoustics in a field setting involves some degree of blind parameter estimation from noisy and reverberant audio. Modern approaches leverage convolutional neural networks (CNNs) in tandem with time-frequency representation. Using short-time Fourier transforms to develop these spectrogram-like features has shown promising results, but this method implicitly discards a significant amount of audio information in the phase domain. Inspired by recent works in speech enhancement, we propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters, namely, volume and RT60. The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features across a wide range of acoustics spaces. We evaluate the effectiveness of the deployment of these novel features in both single-parameter and multi-parameter estimation strategies, using a novel dataset that consists of publicly available room impulse responses (RIRs), synthesized RIRs, and in-house measurements of real acoustic spaces.
ASOct 8, 2021
Individualized Hear-through For Acoustic Transparency Using PCA-Based Sound Pressure Estimation At The EardrumWenyu Jin, Tim Schoof, Henning Schepker
The hear-through functionality on hearing devices, which allows hearing equivalent to the open-ear while providing the possibility to modify the sound pressure at the eardrum in a desired manner, has drawn great attention from researchers in recent years. To this end, the output of the device is processed by means of an equalization filter, such that the transfer function between external sound sources and the eardrum is equivalent for the open-ear and the aided condition with the device in the ear. To achieve an ideal performance, the equalization filter design assumes the exact knowledge of all the relevant acoustic transfer functions. A particular challenge is the transfer function between the hearing device receiver and the eardrum, which is difficult to obtain in practice as it requires additional probe-tube measurements. In this work, we address this issue by proposing an individualized hear-through equalization filter design that leverages the measurement of the so-called secondary path to predict the sound pressure at the eardrum. Experimental results using real-ear measured transfer functions confirm that the proposed method achieves a good sound quality compared to the open-ear while outperforming filter designs that do not leverage the proposed estimator.
ASAug 7, 2020
Classification of Huntington Disease using Acoustic and Lexical FeaturesMatthew Perez, Wenyu Jin, Duc Le et al.
Speech is a critical biomarker for Huntington Disease (HD), with changes in speech increasing in severity as the disease progresses. Speech analyses are currently conducted using either transcriptions created manually by trained professionals or using global rating scales. Manual transcription is both expensive and time-consuming and global rating scales may lack sufficient sensitivity and fidelity. Ultimately, what is needed is an unobtrusive measure that can cheaply and continuously track disease progression. We present first steps towards the development of such a system, demonstrating the ability to automatically differentiate between healthy controls and individuals with HD using speech cues. The results provide evidence that objective analyses can be used to support clinical diagnoses, moving towards the tracking of symptomatology outside of laboratory and clinical environments.
ASJun 20, 2019
A Signal Subspace Rotation Method for Localization of Multiple Wideband Sound SourcesKainan Chen, Wenyu Jin, Bharadwaj Desikan
In this paper, the problem of extending narrowband multichannel sound source localization algorithms to the wideband case is addressed. The DOA estimation of narrowband algorithms is based on the estimate of inter-channel phase differences (IPD) between microphones of the sound sources. A new method for wideband sound source DOA estimation based on signal subspace rotation is present. The proposed algorithm normalizes the narrowband signal statistics by rotating the estimated signal subspace to the wideband counterpart in the eigenvector domain. Then the wideband DOA estimate can be obtained by estimating the normalized IPD from these wideband signal statistics. In addition to requiring less computational complexity compared to repeating the narrowband algorithms for all relevant frequencies of wideband signals, the proposed method also does not require any additional prior knowledge. The experimental results demonstrate the efficacy and the robustness of the proposed method.
SDMay 30, 2017
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent DevelopmentsZixing Zhang, Jürgen Geiger, Jouni Pohjalainen et al.
Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks.