Dialectal Speech Recognition and Translation of Swiss German Speech to Standard German Text: Microsoft's Submission to SwissText 2021
This work addresses the problem of dialectal speech recognition and translation for Swiss German speakers, enabling better communication and accessibility, but it is incremental as it builds on existing hybrid ASR methods.
The paper tackled the problem of recognizing and translating Swiss German speech to Standard German text, a challenging task due to significant dialectal differences and lack of a standardized script, achieving 46.04% BLEU on a blind test set and outperforming the second-best competitor by 12% relative margin.
This paper describes the winning approach in the Shared Task 3 at SwissText 2021 on Swiss German Speech to Standard German Text, a public competition on dialect recognition and translation. Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland. Swiss German differs significantly from standard German in pronunciation, word inventory and grammar. It is mostly incomprehensible to native German speakers. Moreover, it lacks a standardized written script. To solve the challenging task, we propose a hybrid automatic speech recognition system with a lexicon that incorporates translations, a 1st pass language model that deals with Swiss German particularities, a transfer-learned acoustic model and a strong neural language model for 2nd pass rescoring. Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12% relative margin.