CLLGSDASJun 3, 2023

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

arXiv:2306.02153v10.244 citationsh-index: 45
AI Analysis55

This work addresses the challenge of building acoustic word embeddings for low-resource languages, offering incremental improvements in speed and performance.

The paper tackled the problem of creating acoustic word embeddings for untranscribed target languages by improving methods like continued pre-training and using a multilingual phone recognizer for mining pairs, resulting in outperforming recent approaches on word discrimination with orders of magnitude faster speed and high data efficiency.

Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a pre-trained self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competitive. Here, we explore improvements to both approaches: we use continued pre-training to adapt the self-supervised model to the target language, and we use a multilingual phone recognizer (MPR) to mine phone n-gram pairs for training the pooling function. Evaluating on four languages, we show that both methods outperform a recent approach on word discrimination. Moreover, the MPR method is orders of magnitude faster than KNN, and is highly data efficient. We also show a small improvement from performing learned pooling on top of the continued pre-trained representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes