ASAISDMay 29

A Unified and Reproducible Experimentation Framework for Speech Understanding

arXiv:2605.3089994.1h-index: 5
AI Analysis

This framework significantly improves comparability and reproducibility for researchers and developers working on deployment-oriented speech understanding systems.

This paper introduces SURE, a unified experimentation framework that addresses the challenges of non-comparable evaluations and hard-to-reproduce training results in speech understanding, particularly with the advent of Speech foundation models and LLMs. SURE standardizes prediction formats, normalization, and scoring, and provides an agent-assisted training conversion flow to create versioned, runnable training pipelines.

Speech foundation models and Speech LLMs have advanced speech understanding, yet deployment-oriented model selection is hindered by non-comparable evaluations caused by mismatched post-processing, and by training results that are hard to reproduce across data scales and pipelines. We present SURE, a unified experimentation framework that standardizes prediction formats, normalization, and scoring. SURE evaluates strong systems across paradigms, from conventional pipelines to Speech LLMs, on representative tasks under realistic acoustic and linguistic stressors. Beyond evaluation, SURE introduces an agent-assisted training conversion flow that maps paper and code into versioned, runnable training pipelines under a unified protocol on matched open-data subsets. Overall, SURE improves comparability and reproducibility for deployment-oriented evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes