CLAIFeb 13, 2025

SparQLe: Speech Queries to Text Translation Through LLMs

arXiv:2502.09284v31 citationsh-index: 6Has CodeIWSLT
Originality Highly original
AI Analysis

This work addresses the problem of seamless multi-modal processing and speech understanding for applications that rely on speech-to-text translation.

This study tackled the problem of integrating speech representations with Large Language Models (LLMs) for speech-to-text translation, resulting in a method that effectively preserves the semantic content of the input speech. The proposed approach serves as a bridge between self-supervised speech models and instruction-tuned LLMs.

With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and speech understanding. This study introduces a novel approach that combines self-supervised speech representations with instruction-tuned LLMs for speech-to-text translation. The proposed approach leverages a modality adapter to align extracted speech features with instruction-tuned LLMs using English speech data. Our experiments demonstrate that this method effectively preserves the semantic content of the input speech and serves as an effective bridge between self-supervised speech models and instruction-tuned LLMs, offering a promising approach for various speech understanding applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes