CLJun 10, 2025

FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents

Satu Hopponen, Tomi Kinnunen, Alexandre Nikolaev, Rosa González Hautamäki, Lauri Tavi, Einar Meister

arXiv:2506.08981v12.7h-index: 11INTERSPEECH

Originality Synthesis-oriented

AI Analysis

This dataset addresses the problem of studying phonetic and technological aspects of language variability for researchers in speech processing and linguistics, but it is incremental as it builds on existing EMA datasets.

The authors introduced the FROST-EMA dataset, which includes speech from 18 bilingual speakers in native, second language, and imitated accents, enabling research on language variability, and they conducted preliminary case studies on automatic speaker verification and articulatory patterns.

We introduce a new FROST-EMA (Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography) corpus. It consists of 18 bilingual speakers, who produced speech in their native language (L1), second language (L2), and imitated L2 (fake foreign accent). The new corpus enables research into language variability from phonetic and technological points of view. Accordingly, we include two preliminary case studies to demonstrate both perspectives. The first case study explores the impact of L2 and imitated L2 on the performance of an automatic speaker verification system, while the second illustrates the articulatory patterns of one speaker in L1, L2, and a fake accent.

View on arXiv PDF

Similar