CLSDASDec 16, 2024

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection

arXiv:2412.11978v119 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and cost-effective speech data acquisition, particularly for researchers and organizations relying on crowdsourcing, though it is incremental as it builds on existing validation methods.

The paper tackled the problem of ensuring quality in crowdsourced speech data collection by using Speech Foundation Models to automate validation, achieving an estimated cost saving of over 40.0% without degrading data quality.

While crowdsourcing is an established solution for facilitating and scaling the collection of speech data, the involvement of non-experts necessitates protocols to ensure final data quality. To reduce the costs of these essential controls, this paper investigates the use of Speech Foundation Models (SFMs) to automate the validation process, examining for the first time the cost/quality trade-off in data acquisition. Experiments conducted on French, German, and Korean data demonstrate that SFM-based validation has the potential to reduce reliance on human validation, resulting in an estimated cost saving of over 40.0% without degrading final data quality. These findings open new opportunities for more efficient, cost-effective, and scalable speech data acquisition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes