SD CL ASAug 7, 2025

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription

Raymond Grossman, Taejin Park, Kunal Dhawan, Andrew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg

arXiv:2508.05554v15 citationsh-index: 15INTERSPEECH

Originality Synthesis-oriented

AI Analysis

This provides a dataset for multi-speaker speech recognition in the financial domain, which is incremental as it builds on an existing dataset.

The authors tackled the problem of speaker-tagged transcription in multi-speaker financial audio by introducing SPGISpeech 2.0, a dataset of 3,780 additional hours of transcribed earnings calls, which improved speaker-tagged ASR performance when used for fine-tuning.

We introduce SPGISpeech 2.0, a dataset suitable for speaker-tagged transcription in the financial domain. SPGISpeech 2.0 improves the diversity of applicable modeling tasks while maintaining the core characteristic of the original SPGISpeech dataset: audio snippets and their corresponding fully formatted text transcriptions, usable for end-to-end automatic speech recognition (ASR). SPGISpeech 2.0 consists of 3,780 additional hours of professionally transcribed earnings calls. Furthermore, the dataset contains call and speaker information for each audio snippet facilitating multi-talker ASR. We validate the utility of SPGISpeech 2.0 through improvements in speaker-tagged ASR performance of popular speech recognition models after fine-tuning on SPGISpeech 2.0. Released free for non-commercial use, we expect SPGISpeech 2.0 to foster advancements in speech recognition technologies and inspire a wide range of research applications.

View on arXiv PDF

Similar