CLMar 31, 2020

A Swiss German Dictionary: Variation in Speech and Writing

arXiv:2004.00139v1998 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of standardizing Swiss German for automated speech recognition systems, which is incremental as it builds on existing translation and phonetic resources.

The authors tackled the problem of significant variation in written Swiss German by creating a dictionary that pairs Swiss German dialect words with High German translations and phonetic transcriptions, and demonstrated its utility by training Transformer models for phoneme-to-grapheme and grapheme-to-phoneme conversion to support automated speech recognition systems.

We introduce a dictionary containing forms of common words in various Swiss German dialects normalized into High German. As Swiss German is, for now, a predominantly spoken language, there is a significant variation in the written forms, even between speakers of the same dialect. To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA). This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions. Moreover, we control for the regional distribution and insure the equal representation of the major Swiss dialects. The coupling of the phonetic and written Swiss German forms is powerful. We show that they are sufficient to train a Transformer-based phoneme to grapheme model that generates credible novel Swiss German writings. In addition, we show that the inverse mapping - from graphemes to phonemes - can be modeled with a transformer trained with the novel dictionary. This generation of pronunciations for previously unknown words is key in training extensible automated speech recognition (ASR) systems, which are key beneficiaries of this dictionary.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes