ASCLJul 5, 2023

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

arXiv:2307.02083v14 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses the challenge of semantic representation in speech processing for low-resource languages, though it is incremental as it builds on existing multilingual transfer methods.

The paper tackled the problem of learning semantic acoustic word embeddings from untranscribed speech in a target language by leveraging a pre-trained multilingual phonetic model, achieving state-of-the-art performance in an intrinsic word similarity task and demonstrating first-time use for semantic query-by-example search.

Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings. In this paper we explore semantic AWE modelling. These AWEs should not only capture phonetics but also the meaning of a word (similar to textual word embeddings). We consider the scenario where we only have untranscribed speech in a target language. We introduce a number of strategies leveraging a pre-trained multilingual AWE model -- a phonetic AWE model trained on labelled data from multiple languages excluding the target. Our best semantic AWE approach involves clustering word segments using the multilingual AWE model, deriving soft pseudo-word labels from the cluster centroids, and then training a Skipgram-like model on the soft vectors. In an intrinsic word similarity task measuring semantics, this multilingual transfer approach outperforms all previous semantic AWE methods. We also show -- for the first time -- that AWEs can be used for downstream semantic query-by-example search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes