SDCLASAug 7, 2025

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription

arXiv:2508.05554v15 citationsh-index: 15INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This provides a dataset for multi-speaker speech recognition in the financial domain, which is incremental as it builds on an existing dataset.

The authors tackled the problem of speaker-tagged transcription in multi-speaker financial audio by introducing SPGISpeech 2.0, a dataset of 3,780 additional hours of transcribed earnings calls, which improved speaker-tagged ASR performance when used for fine-tuning.

We introduce SPGISpeech 2.0, a dataset suitable for speaker-tagged transcription in the financial domain. SPGISpeech 2.0 improves the diversity of applicable modeling tasks while maintaining the core characteristic of the original SPGISpeech dataset: audio snippets and their corresponding fully formatted text transcriptions, usable for end-to-end automatic speech recognition (ASR). SPGISpeech 2.0 consists of 3,780 additional hours of professionally transcribed earnings calls. Furthermore, the dataset contains call and speaker information for each audio snippet facilitating multi-talker ASR. We validate the utility of SPGISpeech 2.0 through improvements in speaker-tagged ASR performance of popular speech recognition models after fine-tuning on SPGISpeech 2.0. Released free for non-commercial use, we expect SPGISpeech 2.0 to foster advancements in speech recognition technologies and inspire a wide range of research applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes