CL SD ASJan 3, 2023

Supervised Acoustic Embeddings And Their Transferability Across Languages

arXiv:2301.01020v128.9290 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses speech recognition in low-resource settings by improving transferability across languages, though it is incremental as it builds on existing self-supervised and supervised frameworks.

The paper tackled the challenge of modeling phonetic content in speech recognition by showing that supervised acoustic word embeddings using pre-trained self-supervised features outperform self-supervised models, with English-trained models transferring to other languages without adaptation.

In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pre-training has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.

View on arXiv PDF Code

Similar