CL AIOct 16, 2022

RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise

Jinming Zhao, Hao Yang, Gholamreza Haffari, Ehsan Shareghi

arXiv:2210.08475v10.82 citationsh-index: 44

Originality Incremental advance

AI Analysis

This addresses efficiency issues in speech translation for users needing faster and smaller models without quality loss, though it is incremental as it builds on existing wav2vec 2 encoders.

The paper tackled the computational expense of pre-trained speech Transformers in speech translation by introducing RedApt, a Reducer Adaptor block, which achieved a 41% speedup, 33% memory reduction, and 24% fewer FLOPs at inference while outperforming the state-of-the-art by an average of 0.68 BLEU score on 8 language pairs.

Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained wav2vec 2 speech encoder with RedAptbrings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.

View on arXiv PDF

Similar