CLOct 22, 2020

Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification

Badr M. Abdullah, Jacek Kudera, Tania Avgustinova, Bernd Möbius, Dietrich Klakow

arXiv:2010.11973v131.0991 citations

Originality Synthesis-oriented

AI Analysis

This work addresses language identification for Slavic languages, but it is incremental as it focuses on analyzing emergent representations rather than introducing a new method.

The authors tackled the problem of Slavic language identification in speech signals using a neural model and found that perceptual confusability between languages is the best predictor of representation similarity, rather than objective language relatedness.

Deep neural networks have been employed for various spoken language recognition tasks, including tasks that are multilingual by definition such as spoken language identification. In this paper, we present a neural model for Slavic language identification in speech signals and analyze its emergent representations to investigate whether they reflect objective measures of language relatedness and/or non-linguists' perception of language similarity. While our analysis shows that the language representation space indeed captures language relatedness to a great extent, we find perceptual confusability between languages in our study to be the best predictor of the language representation similarity.

View on arXiv PDF

Similar