SDDec 29, 2016

Phase-incorporating Speech Enhancement Based on Complex-valued Gaussian Process Latent Variable Model

arXiv:1612.09150v2
AI Analysis

This work addresses speech enhancement for noisy audio signals, but it is incremental as it builds on existing Gaussian process methods by incorporating phase modification.

The paper tackled speech enhancement by directly modifying both magnitude and phase of noisy speech spectra using a complex-valued Gaussian process latent variable model, and experiments on the CHTTL database showed it outperformed baseline methods on several standard measures.

Traditional speech enhancement techniques modify the magnitude of a speech in time-frequency domain, and use the phase of a noisy speech to resynthesize a time domain speech. This work proposes a complex-valued Gaussian process latent variable model (CGPLVM) to enhance directly the complex-valued noisy spectrum, modifying not only the magnitude but also the phase. The main idea that underlies the developed method is the modeling of short-time Fourier transform (STFT) coefficients across the time frames of a speech as a proper complex Gaussian process (GP) with noise added. The proposed method is based on projecting the spectrum into a low-dimensional subspace. The likelihood criterion is used to optimize the hyperparameters of the model. Experiments were carried out on the CHTTL database, which contains the digits zero to nine in Mandarin. Several standard measures are used to demonstrate that the proposed method outperforms baseline methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes