CLSDASNov 9, 2022

Efficient Speech Translation with Pre-trained Models

arXiv:2211.04939v12 citationsh-index: 39
Originality Incremental advance
AI Analysis

This work addresses the computational efficiency challenge for researchers and practitioners building speech translation systems, though it is incremental as it builds on existing pre-trained models.

The paper tackled the problem of high computational resource requirements in speech translation by leveraging pre-trained models to build efficient systems that can be trained on a single GPU, resulting in a 6 BLEU point improvement in translation quality with limited data through a similarity loss technique.

When building state-of-the-art speech translation models, the need for large computational resources is a significant obstacle due to the large training data size and complex models. The availability of pre-trained models is a promising opportunity to build strong speech translation systems efficiently. In a first step, we investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models. Using this strategy, we can train and apply the models on a single GPU. While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data. In a second step, we proposed an additional similarity loss to encourage the model to generate similar hidden representations for speech and transcript. Using this technique, we can increase the data efficiency and improve the translation quality by 6 BLEU points in scenarios with limited end-to-end training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes