LG AI CL HC ASJun 29, 2024

Open-Source Conversational AI with SpeechBrain 1.0

Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra

arXiv:2407.00463v529.398 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This toolkit addresses the problem of accessibility and standardization in Conversational AI research for the community, though it is incremental as an update to an existing framework.

The paper presents SpeechBrain 1.0, an open-source Conversational AI toolkit that tackles the need for transparency and replicability in speech processing by providing over 200 recipes and 100 pre-trained models, along with new features like LLM integration and a benchmark repository.

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

View on arXiv PDF

Similar