CLSep 21, 2021

How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings

arXiv:2109.10179v1663 citations
Originality Synthesis-oriented
AI Analysis

This work addresses how neural networks process speech in unknown languages, with implications for modeling cross-lingual speech processing, but it is incremental as it applies existing methods to analyze language similarity.

The study investigated whether typological similarity between a model's training language and an unknown language affects neural network representations of speech sounds, finding that it does influence representational similarity in acoustic word embeddings across seven Indo-European languages.

How do neural networks "perceive" speech sounds from unknown languages? Does the typological similarity between the model's training language (L1) and an unknown language (L2) have an impact on the model representations of L2 speech signals? To answer these questions, we present a novel experimental design based on representational similarity analysis (RSA) to analyze acoustic word embeddings (AWEs) -- vector representations of variable-duration spoken-word segments. First, we train monolingual AWE models on seven Indo-European languages with various degrees of typological similarity. We then employ RSA to quantify the cross-lingual similarity by simulating native and non-native spoken-word processing using AWEs. Our experiments show that typological similarity indeed affects the representational similarity of the models in our study. We further discuss the implications of our work on modeling speech processing and language similarity with neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes