AS AI CL LGJan 11, 2024

End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2

Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

arXiv:2401.06183v15 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This addresses cross-lingual communication barriers for Hindi speakers, but it is incremental as it combines existing technologies.

The paper tackled Hindi-to-English speech conversion by integrating XLSR Wav2Vec2 for ASR, mBART for NMT, and a TTS component, resulting in a unified framework for synthesizing English audio from spoken Hindi.

Speech has long been a barrier to effective communication and connection, persisting as a challenge in our increasingly interconnected world. This research paper introduces a transformative solution to this persistent obstacle an end-to-end speech conversion framework tailored for Hindi-to-English translation, culminating in the synthesis of English audio. By integrating cutting-edge technologies such as XLSR Wav2Vec2 for automatic speech recognition (ASR), mBART for neural machine translation (NMT), and a Text-to-Speech (TTS) synthesis component, this framework offers a unified and seamless approach to cross-lingual communication. We delve into the intricate details of each component, elucidating their individual contributions and exploring the synergies that enable a fluid transition from spoken Hindi to synthesized English audio.

View on arXiv PDF

Similar