ASLGMLAug 13, 2020

Textual Echo Cancellation

arXiv:2008.06006v46 citations
AI Analysis

This addresses a specific problem for users of intelligent devices like smart speakers by enabling them to talk while the device is playing responses, though it is an incremental improvement over existing echo cancellation methods.

The paper tackles the problem of cancelling text-to-speech playback echo from overlapping speech recordings to improve speech recognition and user experience for smart devices, achieving enhanced audio by using a sequence-to-sequence model with multi-source attention that leverages textual information, which reduces communication latency compared to acoustic echo cancellation.

In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings. Such a system can largely improve speech recognition performance and user experience for intelligent devices such as smart speakers, as the user can talk to the device while the device is still playing the TTS signal responding to the previous query. We implement this system by using a novel sequence-to-sequence model with multi-source attention that takes both the microphone mixture signal and source text of the TTS playback as inputs, and predicts the enhanced audio. Experiments show that the textual information of the TTS playback is critical to enhancement performance. Besides, the text sequence is much smaller in size compared with the raw acoustic signal of the TTS playback, and can be immediately transmitted to the device or ASR server even before the playback is synthesized. Therefore, our proposed approach effectively reduces Internet communication and latency compared with alternative approaches such as acoustic echo cancellation (AEC).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes