SDLGASOct 29, 2019

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

arXiv:1910.13288v1
Originality Synthesis-oriented
AI Analysis

This addresses speech factorization for speech processing tasks, but it is incremental as it builds on existing normalization flow methods.

The paper tackled the problem of decomposing speech signals into independent factors like phonetic content and speaker traits using an unsupervised normalization flow model, and found that the latent code space exhibits favorable properties such as denseness and pseudo-linearity, with these factors represented as specific directions.

Speech signals are complex composites of various information, including phonetic content, speaker traits, channel effect, etc. Decomposing this complicated mixture into independent factors, i.e., speech factorization, is fundamentally important and plays the central role in many important algorithms of modern speech processing tasks. In this paper, we present a preliminary investigation on unsupervised speech factorization based on the normalization flow model. This model constructs a complex invertible transform, by which we can project speech segments into a latent code space where the distribution is a simple diagonal Gaussian. Our preliminary investigation on the TIMIT database shows that this code space exhibits favorable properties such as denseness and pseudo linearity, and perceptually important factors such as phonetic content and speaker trait can be represented as particular directions within the code space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes