CL SD ASDec 21, 2023

Speech Translation with Large Language Models: An Industrial Practice

Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li

ByteDance

arXiv:2312.13585v19.135 citationsh-index: 31

Originality Incremental advance

AI Analysis

This work addresses speech translation for industrial applications, but it is incremental as it builds on existing LLM methods.

The paper tackles speech translation by introducing LLM-ST, a model based on a pre-trained large language model integrated with a speech encoder and multi-task instruction tuning, which achieves new benchmark performance on English and Chinese datasets.

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST. Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST, establishing a new benchmark in the field of speech translation. Demo: https://speechtranslation.github.io/llm-st/.

View on arXiv PDF

Similar