CLASDec 1, 2020

Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

arXiv:2012.00876v13 citations
AI Analysis

This work addresses the problem of understanding language similarity for speech-based NLP tasks, particularly in low-resource scenarios where existing multilingual NLP works are limited.

This paper proposes a deep learning method to analyze language similarity from acoustic examples, specifically by training a model on the Wilderness dataset and comparing its latent space with classical language family findings. This approach offers a new direction for cross-lingual data augmentation in speech-based NLP tasks.

Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language family findings. Our approach provides a new direction for cross-lingual data augmentation in any speech-based NLP task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes