ASCLSDJul 23, 2023

MyVoice: Arabic Speech Resource Collaboration Platform

arXiv:2308.02503v1h-index: 32
Originality Synthesis-oriented
AI Analysis

This addresses the need for diverse Arabic speech resources to enhance dialectal speech technologies, though it is incremental as it builds on existing crowdsourcing methods.

The authors tackled the problem of limited dialectal Arabic speech data by introducing MyVoice, a crowdsourcing platform that collects and validates Arabic speech recordings, resulting in the creation of publicly available large dialectal datasets.

We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes