SDASJan 29, 2022

ItôWave: Itô Stochastic Differential Equation Is All You Need For Wave Generation

arXiv:2201.12519v210 citations
Originality Incremental advance
AI Analysis

This work addresses audio generation for applications like speech synthesis, representing an incremental improvement over existing vocoders.

The paper tackles the problem of generating realistic audio from mel spectrograms by proposing ItôWave, a vocoder based on forward and reverse-time linear stochastic differential equations, which achieved a mean opinion score of 4.35±0.115, exceeding current state-of-the-art methods.

In this paper, we propose a vocoder based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of this SDE pair are two stochastic processes, one of which turns the distribution of wave, that we want to generate, into a simple and tractable distribution. The other is the generation procedure that turns this tractable simple signal into the target wave. The model is called ItôWave. ItôWave use the Wiener process as a driver to gradually subtract the excess signal from the noise signal to generate realistic corresponding meaningful audio respectively, under the conditional inputs of original mel spectrogram. The results of the experiment show that the mean opinion scores (MOS) of ItôWave can exceed the current state-of-the-art (SOTA) methods, and reached 4.35$\pm$0.115. The generated audio samples are available online.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes