CLLGASMay 12, 2025

Spoken Language Understanding on Unseen Tasks With In-Context Learning

DeepMind
arXiv:2505.07731v10.123 citationsh-index: 30Has CodeINTERSPEECH
AI Analysis75

This addresses the challenge of adapting speech-text large language models to new SLU tasks without labeled data, offering a practical solution for real-world applications where data annotation is costly or infeasible.

The paper tackles the problem of spoken language understanding (SLU) on unseen tasks where task-specific training data is unavailable, by introducing a novel fine-tuning approach using randomized class labels, which significantly improves performance over standard methods without requiring task-specific annotations.

Spoken language understanding (SLU) tasks involve diverse skills that probe the information extraction, classification and/or generation capabilities of models. In this setting, task-specific training data may not always be available. While traditional task-specific SLU models are unable to cater to such requirements, the speech-text large language models (LLMs) offer a promising alternative with emergent abilities. However, out of-the-box, our evaluations indicate that the zero/few-shot performance of prominent open-source speech-text LLMs on SLU tasks are not up to the mark. In this paper, we introduce a novel approach to robust task-agnostic fine-tuning using randomized class labels. With this proposed fine-tuning, we illustrate that the performance of the speech-text LLMs on an unseen task is significantly improved over standard approaches. Critically, the proposed approach avoids the requirement of task-specific data annotations for enabling new tasks in speech-text LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes