CLSDASMay 7, 2021

A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect

arXiv:2105.03409v12 citations
Originality Synthesis-oriented
AI Analysis

This provides a comparative analysis for users selecting STT services in French, but it is incremental as it applies existing methods to new data without novel algorithmic contributions.

The study benchmarked four cloud-based speech-to-text services for French speech, testing 40,158 files across clean and noisy conditions, and found Microsoft Azure had the lowest error rate of 9.09% on clean speech with high noise robustness, while IBM Watson was highly sensitive to noise.

This study presents a large scale benchmarking on cloud based Speech-To-Text systems: {Google Cloud Speech-To-Text}, {Microsoft Azure Cognitive Services}, {Amazon Transcribe}, {IBM Watson Speech to Text}. For each systems, 40158 clean and noisy speech files about 101 hours are tested. Effect of background noise on STT quality is also evaluated with 5 different Signal-to-noise ratios from 40dB to 0dB. Results showed that {Microsoft Azure} provided lowest transcription error rate $9.09\%$ on clean speech, with high robustness to noisy environment. {Google Cloud} and {Amazon Transcribe} gave similar performance, but the latter is very limited for time-constraint usage. Though {IBM Watson} could work correctly in quiet conditions, it is highly sensible to noisy speech which could strongly limit its application in real life situations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes