CLMay 11, 2021

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

arXiv:2105.04840v1717 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of monotonicity in CTC for speech translation, offering insights for non-autoregressive methods but is incremental in scope.

The paper investigates whether CTC-based non-autoregressive models can handle word reordering in speech translation, finding that transformer encoders enable some reordering capability as measured by Kendall's tau distance.

We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance. CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability. Kendall's tau distance is introduced as the quantitative metric, and gradient-based visualization provides an intuitive way to take a closer look into the model. Our analysis shows that transformer encoders have the ability to change the word order and points out the future research direction that worth being explored more on non-autoregressive speech translation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes