SD LG ASApr 29, 2021

End-to-End Speech Recognition from Federated Acoustic Models

Yan Gao, Titouan Parcollet, Salah Zaiem, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane

arXiv:2104.14297v221.849 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the gap in realistic federated learning for speech recognition, which is important for developers and researchers aiming to deploy privacy-preserving ASR systems, though it is incremental as it builds on existing FL methods.

The paper tackled the problem of training end-to-end automatic speech recognition models in realistic federated learning settings with heterogeneous data, by constructing a challenging experimental setup using CommonVoice datasets and comparing three aggregation strategies, achieving results that show WER-based aggregation performs best in cross-silo and cross-device scenarios with up to 4K clients.

Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French and Italian sets of the CommonVoice dataset, a large heterogeneous dataset containing thousands of different speakers, acoustic environments and noises. We present the first empirical study on attention-based sequence-to-sequence End-to-End (E2E) ASR model with three aggregation weighting strategies -- standard FedAvg, loss-based aggregation and a novel word error rate (WER)-based aggregation, compared in two realistic FL scenarios: cross-silo with 10 clients and cross-device with 2K and 4K clients. Our analysis on E2E ASR from heterogeneous and realistic federated acoustic models provides the foundations for future research and development of realistic FL-based ASR applications.

View on arXiv PDF Code

Similar