CLSDASOct 5, 2022

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

arXiv:2210.02545v1289 citationsh-index: 34Has Code
Originality Synthesis-oriented
AI Analysis

This provides an accessible tool for researchers and practitioners in speech processing, but it is incremental as it builds on existing frameworks.

The authors tackled speech-to-text tasks by extending JoeyNMT into JoeyS2T, a minimalist toolkit that performs competitively on English speech recognition and English-to-German speech translation benchmarks.

JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes