CLSDASNov 30, 2022

EURO: ESPnet Unsupervised ASR Open-source Toolkit

arXiv:2211.17196v310 citationsh-index: 83Has Code
Originality Synthesis-oriented
AI Analysis

This toolkit addresses the need for accessible and reproducible tools in the emerging UASR research area, though it is incremental as it builds on existing methods.

The paper introduces EURO, an open-source toolkit for unsupervised automatic speech recognition (UASR) that integrates state-of-the-art methods like Wav2vec-U, achieving state-of-the-art performance on TIMIT and LibriSpeech datasets.

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes