CLSDASMay 9, 2023

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

arXiv:2305.05201v14 citations
Originality Incremental advance
AI Analysis

This study addresses a fundamental gap in self-supervised learning research by comparing cross-lingual and monolingual models for Japanese ASR, providing guidance for future Japanese SSL studies.

The paper empirically investigates the effectiveness of cross-lingual versus monolingual self-supervised learning models for Japanese automatic speech recognition, finding that comparable performance can be achieved with less Japanese unlabeled data than previously thought, and demonstrates state-of-the-art results on multiple ASR tasks.

Self-supervised learning (SSL) has been dramatically successful not only in monolingual but also in cross-lingual settings. However, since the two settings have been studied individually in general, there has been little research focusing on how effective a cross-lingual model is in comparison with a monolingual model. In this paper, we investigate this fundamental question empirically with Japanese automatic speech recognition (ASR) tasks. First, we begin by comparing the ASR performance of cross-lingual and monolingual models for two different language tasks while keeping the acoustic domain as identical as possible. Then, we examine how much unlabeled data collected in Japanese is needed to achieve performance comparable to a cross-lingual model pre-trained with tens of thousands of hours of English and/or multilingual data. Finally, we extensively investigate the effectiveness of SSL in Japanese and demonstrate state-of-the-art performance on multiple ASR tasks. Since there is no comprehensive SSL study for Japanese, we hope this study will guide Japanese SSL research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes