CLSDASJan 29, 2025

Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

arXiv:2501.17615v1h-index: 17IEEE Transactions on Audio, Speech, and Language Processing
Originality Incremental advance
AI Analysis

This addresses the challenge of limited data for low-resource languages in multilingual ASR systems, though it appears incremental as it builds upon existing hierarchical Softmax methods.

The paper tackles the problem of improving multilingual speech recognition for low-resource languages by proposing a cross-lingual embedding clustering method to construct a hierarchical Softmax decoder, resulting in enhanced ASR accuracy on a dataset of 15 languages.

We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes