CLSDASDec 21, 2023

Speech Translation with Large Language Models: An Industrial Practice

ByteDance
arXiv:2312.13585v135 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses speech translation for industrial applications, but it is incremental as it builds on existing LLM methods.

The paper tackles speech translation by introducing LLM-ST, a model based on a pre-trained large language model integrated with a speech encoder and multi-task instruction tuning, which achieves new benchmark performance on English and Chinese datasets.

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST. Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST, establishing a new benchmark in the field of speech translation. Demo: https://speechtranslation.github.io/llm-st/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes