Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
This work addresses the digital divide in speech technology for low-resource language communities, though it appears incremental as it builds on existing cross-lingual transfer methods.
The paper tackles the problem of scaling speech processing to hundreds of low-resource languages by proposing an acoustic language similarity approach to identify effective cross-lingual transfer pairs, demonstrating its effectiveness in tasks like language family classification, speech recognition, and speech synthesis.
Speech processing systems currently do not support the vast majority of languages, in part due to the lack of data in low-resource languages. Cross-lingual transfer offers a compelling way to help bridge this digital divide by incorporating high-resource data into low-resource systems. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages. However, scaling up speech systems to support hundreds of low-resource languages remains unsolved. To help bridge this gap, we propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages. We demonstrate the effectiveness of our approach in language family classification, speech recognition, and speech synthesis tasks.