SDAILGASJul 8, 2022

End-to-End Binaural Speech Synthesis

arXiv:2207.03697v115 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses the problem of generating realistic binaural audio for applications like virtual reality, though it appears incremental as it builds on existing VQ-VAE frameworks.

The researchers tackled binaural speech synthesis by developing an end-to-end system that combines a low-bitrate audio codec with a binaural decoder, achieving results that match ground truth data more closely than previous methods.

In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb. The network is a modified vector-quantized variational autoencoder, trained with several carefully designed objectives, including an adversarial loss. We evaluate the proposed system on an internal binaural dataset with objective metrics and a perceptual study. Results show that the proposed approach matches the ground truth data more closely than previous methods. In particular, we demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes