BlasBench: An Open Benchmark for Irish Speech Recognition
For Irish ASR researchers, this benchmark enables reliable comparison and reveals a generalisation gap missed by single-dataset evaluation.
Existing multilingual benchmarks fail for Irish due to lack of Irish-aware text normalisation. BlasBench provides a normaliser and harness, revealing that Whisper variants exceed 100% WER, Microsoft Azure achieves 22.2% WER on Common Voice, and fine-tuned models degrade 33-43 points on FLEURS while multilingual models degrade only 7-10 points.
Existing multilingual benchmarks include Irish among dozens of languages but apply no Irish-aware text normalisation, leaving reliable and reproducible ASR comparison impossible. We introduce BlasBench, an open evaluation harness that provides a standalone Irish-aware normaliser preserving fadas, lenition, and eclipsis; a reproducible scoring harness and per-utterance predictions released for all evaluated runs. We pilot this by benchmarking 12 systems across four architecture families on Common Voice ga-IE and FLEURS ga-IE. All Whisper variants exceed 100% WER through insertion-driven hallucination. Microsoft Azure reaches 22.2% WER on Common Voice and 57.5% on FLEURS; the best open model, Omnilingual ASR 7B, reaches 30.65% and 39.09% respectively. Models fine-tuned on Common Voice degrade 33-43 points moving to FLEURS, while massively multilingual models degrade only 7-10 - a generalisation gap that single-dataset evaluation misses.