CLSDASDec 31, 2024

Fotheidil: an Automatic Transcription System for the Irish Language

arXiv:2501.00509v119 citationsh-index: 24COLING Workshops
Originality Incremental advance
AI Analysis

This provides a freely available tool for researchers and others transcribing Irish language materials, though it is incremental as it builds on existing speech AI technologies.

The authors developed Fotheidil, the first web-based transcription system for Irish, which uses semi-supervised learning to improve acoustic models for out-of-domain data and dialects, and a novel sequence-to-sequence approach for capitalisation and punctuation restoration, showing substantial performance gains.

This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. The system will be made freely available for public use, and represents an important resource to researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes