ASAICLJul 15, 2025

Towards Robust Speech Recognition for Jamaican Patois Music Transcription

arXiv:2507.16834v12 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses accessibility issues for Jamaican Patois music listeners and creators, but it is incremental as it applies existing methods to new data.

The researchers tackled the problem of poor speech recognition for Jamaican Patois music by curating over 40 hours of manually transcribed data and fine-tuning state-of-the-art ASR models, resulting in the development of scaling laws for Whisper models on this audio.

Although Jamaican Patois is a widely spoken language, current speech recognition systems perform poorly on Patois music, producing inaccurate captions that limit accessibility and hinder downstream applications. In this work, we take a data-centric approach to this problem by curating more than 40 hours of manually transcribed Patois music. We use this dataset to fine-tune state-of-the-art automatic speech recognition (ASR) models, and use the results to develop scaling laws for the performance of Whisper models on Jamaican Patois audio. We hope that this work will have a positive impact on the accessibility of Jamaican Patois music and the future of Jamaican Patois language modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes