CLAIFeb 26

Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis

arXiv:2603.19259h-index: 10
AI Analysis

This provides a replicable evaluation framework for Taiwanese Hokkien speech technology, with methodologies applicable to other linguistic contexts.

The authors tackled the problem of evaluating Taiwanese Hokkien speech recognition and synthesis systems by introducing Breeze Taigi, a framework with standardized benchmarks and reference models. Their fine-tuned Whisper model achieved 30.13% average Character Error Rate, outperforming existing systems.

Taiwanese Hokkien (Taigi) presents unique opportunities for advancing speech technology methodologies that can generalize to diverse linguistic contexts. We introduce Breeze Taigi, a comprehensive framework centered on standardized benchmarks for evaluating Taigi speech recognition and synthesis systems. Our primary contribution is a reproducible evaluation methodology that leverages parallel Taiwanese Mandarin resources. We provide 30 carefully curated Mandarin-Taigi audio pairs from Taiwan's Executive Yuan public service announcements with normalized ground truth transcriptions. We establish Character Error Rate (CER) as the standard metric and implement normalization procedures to enable fair cross-system comparisons. To demonstrate the benchmark's utility and provide reference implementations, we develop speech recognition and synthesis models through a methodology that leverages existing Taiwanese Mandarin resources and large-scale synthetic data generation. In particular, we fine-tune a Whisper model on approximately 10,000 hours of Taigi synthetic speech data. Our ASR model achieves 30.13% average CER on the benchmark, outperforming existing commercial and research systems. By providing standardized evaluation protocols, diverse training datasets, and open baseline models, we offer a replicable framework with methodologies applicable to various linguistic contexts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes