Sangeet Sagar

CL
5papers
1,094citations
Novelty29%
AI Score27

5 Papers

ASJun 6, 2023
RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain

Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer et al.

Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional speech in noisy and reverberant acoustic environments. This poses a particular challenge in the search and rescue (SAR) domain, where transcribing conversations among rescue team members is crucial to support real-time decision-making. The scarcity of speech data and associated background noise in SAR scenarios make it difficult to deploy robust speech recognition systems. To address this issue, we have created and made publicly available a German speech dataset called RescueSpeech. This dataset includes real speech recordings from simulated rescue exercises. Additionally, we have released competitive training recipes and pre-trained models. Our study highlights that the performance attained by state-of-the-art methods in this challenging scenario is still far from reaching an acceptable level.

CRMay 27, 2022
Defending Against Stealthy Backdoor Attacks

Sangeet Sagar, Abhinav Bhatt, Abhijith Srinivas Bidaralli

Defenses against security threats have been an interest of recent studies. Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game. Backdoor attacks are one such attack where a neural network is made to perform in a certain way on specific samples containing some triggers while achieving normal results on other samples. In this work, we present a few defense strategies that can be useful to counter against such an attack. We show that our defense methodologies significantly decrease the performance on the attacked inputs while maintaining similar performance on benign inputs. We also show that some of our defenses have very less runtime and also maintain similarity with the original inputs.

LGJun 29, 2024Code
Open-Source Conversational AI with SpeechBrain 1.0

Mirco Ravanelli, Titouan Parcollet, Adel Moumen et al.

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

CLJul 2, 2020
A Bayesian Multilingual Document Model for Zero-shot Topic Identification and Discovery

Santosh Kesiraju, Sangeet Sagar, Ondřej Glembek et al.

In this paper, we present a Bayesian multilingual document model for learning language-independent document embeddings. The model is an extension of BaySMM [Kesiraju et al 2020] to the multilingual scenario. It learns to represent the document embeddings in the form of Gaussian distributions, thereby encoding the uncertainty in its covariance. We propagate the learned uncertainties through linear classifiers that benefit zero-shot cross-lingual topic identification. Our experiments on 17 languages show that the proposed multilingual Bayesian document model performs competitively, when compared to other systems based on large-scale neural networks (LASER, XLM-R, mUSE) on 8 high-resource languages, and outperforms these systems on 9 mid-resource languages. We revisit cross-lingual topic identification in zero-shot settings by taking a deeper dive into current datasets, baseline systems and the languages covered. We identify shortcomings in the existing evaluation protocol (MLDoc dataset), and propose a robust alternative scheme, while also extending the cross-lingual experimental setup to 17 languages. Finally, we consolidate the observations from all our experiments, and discuss points that can potentially benefit the future research works in applications relying on cross-lingual transfers.

CLJun 5, 2020
ELITR Non-Native Speech Translation at IWSLT 2020

Dominik Macháček, Jonáš Kratochvíl, Sangeet Sagar et al.

This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-to-end general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set.