CLNov 10, 2020

Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS

arXiv:2011.04845v20.0016 citations
AI Analysis25

This work addresses the need for low-latency speech translation systems, but appears incremental as it combines existing neural modules into a simultaneous framework.

The paper tackled the problem of simultaneous speech-to-speech translation by developing a system with fully-incremental neural modules for ASR, MT, and TTS, and evaluated its overall latency in terms of Ear-Voice Span and speaking latency along with module-level performance.

This paper presents a newly developed, simultaneous neural speech-to-speech translation system and its evaluation. The system consists of three fully-incremental neural processing modules for automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). We investigated its overall latency in the system's Ear-Voice Span and speaking latency along with module-level performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes
Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS | Scholar Feed