CLAIAug 29, 2025

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

arXiv:2509.04473v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited data and resources for speech and language integration, offering a solution for multi-task understanding in low-resource scenarios, though it is incremental in nature.

The paper tackled the problem of integrating speech encoders with large language models in low-resource settings by proposing a parameter-efficient adapter and LLM-based synthetic dataset annotation, achieving a 26% relative WER improvement on ASR, a 6.3% relative F1 increase on NER, and a 32% relative F1 boost on SA.

While integrating speech encoder with LLM requires substantial data and resources, use cases face limitations due to insufficient availability. To address this, we propose a solution with a parameter-efficient adapter that converts speech embeddings into LLM-compatible tokens, focusing on end-to-end automatic speech recognition (ASR), named entity recognition (NER), and sentiment analysis (SA). To reduce labeling costs, we employ an LLM-based synthetic dataset annotation technique. The proposed adapter, using 7x fewer trainable parameters, achieves significant performance gains: a 26% relative Word Error Rates (WER) improvement on the LibriSpeech ASR task, a 6.3% relative F1 score increase on the NER task, and a 32% relative F1 score boost on the SA task. Moreover, using advanced techniques such as adding a classifier regularizer and optimizing the LLM with Low-Rank Adaptation (LoRA) yields notable performance gains, with Spoken Language Understanding Evaluation (SLUE) score improvement of 6.6% and 9.5%

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes