SD CL ASAug 21, 2023

LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices

Joerg Schmalenstroeer, Tobias Gburrek, Reinhold Haeb-Umbach

arXiv:2308.10682v12.34 citationsh-index: 15Has Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses the problem of developing robust meeting transcription systems for researchers and engineers working with asynchronous wireless acoustic sensor networks, but it is incremental as it builds upon the existing LibriCSS design.

The authors introduced LibriWASN, a dataset for testing algorithms in meeting scenarios with unsynchronized recording devices, featuring 29 channels from nine devices and ground-truth diarization, designed to evaluate synchronization, separation, diarization, and transcription systems.

We present LibriWASN, a data set whose design follows closely the LibriCSS meeting recognition data set, with the marked difference that the data is recorded with devices that are randomly positioned on a meeting table and whose sampling clocks are not synchronized. Nine different devices, five smartphones with a single recording channel and four microphone arrays, are used to record a total of 29 channels. Other than that, the data set follows closely the LibriCSS design: the same LibriSpeech sentences are played back from eight loudspeakers arranged around a meeting table and the data is organized in subsets with different percentages of speech overlap. LibriWASN is meant as a test set for clock synchronization algorithms, meeting separation, diarization and transcription systems on ad-hoc wireless acoustic sensor networks. Due to its similarity to LibriCSS, meeting transcription systems developed for the former can readily be tested on LibriWASN. The data set is recorded in two different rooms and is complemented with ground-truth diarization information of who speaks when.

View on arXiv PDF Code

Similar