End-to-End Spoken Language Translation
This addresses the challenge of spoken language translation for cross-lingual communication, but it is incremental as it builds on existing methods.
The paper tackles the problem of translating spoken sentences between languages using only spectrogram pairs, achieving competitive performance with state-of-the-art methods and generalization to unseen speakers.
In this paper, we address the task of spoken language understanding. We present a method for translating spoken sentences from one language into spoken sentences in another language. Given spectrogram-spectrogram pairs, our model can be trained completely from scratch to translate unseen sentences. Our method consists of a pyramidal-bidirectional recurrent network combined with a convolutional network to output sentence-level spectrograms in the target language. Empirically, our model achieves competitive performance with state-of-the-art methods on multiple languages and can generalize to unseen speakers.