CL SD ASJul 24, 2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe

arXiv:2107.11628v12.213 citations

Originality Incremental advance

AI Analysis

This enables linguists to document new languages and build phone-based lexicons, though it is incremental as it builds on existing phonemic data.

The authors tackled the problem of building language-universal speech recognition by deriving phone-level supervision from phonemic transcriptions and mappings, resulting in a model trained on 7 diverse languages with interpretable probabilistic mappings.

Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages. While speech annotations at the language-specific phoneme or surface levels are readily available, annotations at a universal phone level are relatively rare and difficult to produce. In this work, we present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings with learnable weights represented using weighted finite-state transducers, which we call differentiable allophone graphs. By training multilingually, we build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language. These phone-based systems with learned allophone graphs can be used by linguists to document new languages, build phone-based lexicons that capture rich pronunciation variations, and re-evaluate the allophone mappings of seen language. We demonstrate the aforementioned benefits of our proposed framework with a system trained on 7 diverse languages.

View on arXiv PDF

Similar