CL SD ASMar 1, 2022

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli

arXiv:2203.00648v31.68 citationsh-index: 52

Originality Incremental advance

AI Analysis

This study provides insights for speech recognition researchers by dissecting domain factors, though it is incremental as it builds on prior work on domain mismatch.

The paper tackles the problem of understanding how individual domain factors like accent and syntax affect self-supervised pre-training for automatic speech recognition, finding that phonetic factors are crucial while grammatical ones are less important.

Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. Previous work explores the effect of domain mismatch in automatic speech recognition between pre-training and fine-tuning as a whole but does not dissect the contribution of individual factors. In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition. To do so, we pre-train models either on modified natural speech or synthesized audio, with a single domain factor modified, and then measure performance after fine-tuning. Results show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important. To our knowledge, this is the first study to better understand the domain characteristics of pre-trained sets in self-supervised pre-training for speech.

View on arXiv PDF

Similar