Pablo Peso Parada

SD
h-index24
4papers
39citations
Novelty49%
AI Score37

4 Papers

SDJul 24, 2023
Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics

Umberto Michieli, Pablo Peso Parada, Mete Ozay

Keyword Spotting (KWS) models on embedded devices should adapt fast to new user-defined words without forgetting previous ones. Embedded devices have limited storage and computational resources, thus, they cannot save samples or update large models. We consider the setup of embedded online continual learning (EOCL), where KWS models with frozen backbone are trained to incrementally recognize new words from a non-repeated stream of samples, seen one at a time. To this end, we propose Temporal Aware Pooling (TAP) which constructs an enriched feature space computing high-order moments of speech features extracted by a pre-trained backbone. Our method, TAP-SLDA, updates a Gaussian model for each class on the enriched feature space to effectively use audio representations. In experimental analyses, TAP-SLDA outperforms competitors on several setups, backbones, and baselines, bringing a relative average gain of 11.3% on the GSC dataset.

ASJan 23, 2024
Locality enhanced dynamic biasing and sampling strategies for contextual ASR

Md Asif Jalal, Pablo Peso Parada, George Pavlidis et al.

Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the training of CB for ASR with correlation plots between the bias embeddings among various training stages. Secondly, we introduce a neighbourhood attention (NA) that localizes self attention (SA) to the nearest neighbouring frames to further refine the CB output. The results show that this proposed approach provides on average a 25.84% relative WER improvement on LibriSpeech sets and rare-word evaluation compared to the baseline.

CLSep 23, 2025
Retrieval Augmented Generation based context discovery for ASR

Dimitrios Siskos, Stavros Papadopoulos, Pablo Peso Parada et al.

This work investigates retrieval augmented generation as an efficient strategy for automatic context discovery in context-aware Automatic Speech Recognition (ASR) system, in order to improve transcription accuracy in the presence of rare or out-of-vocabulary terms. However, identifying the right context automatically remains an open challenge. This work proposes an efficient embedding-based retrieval approach for automatic context discovery in ASR. To contextualize its effectiveness, two alternatives based on large language models (LLMs) are also evaluated: (1) large language model (LLM)-based context generation via prompting, and (2) post-recognition transcript correction using LLMs. Experiments on the TED-LIUMv3, Earnings21 and SPGISpeech demonstrate that the proposed approach reduces WER by up to 17% (percentage difference) relative to using no-context, while the oracle context results in a reduction of up to 24.1%.

SDOct 15, 2015
Evaluating the Non-Intrusive Room Acoustics Algorithm with the ACE Challenge

Pablo Peso Parada, Dushyant Sharma, Toon van Waterschoot et al.

We present a single channel data driven method for non-intrusive estimation of full-band reverberation time and full-band direct-to-reverberant ratio. The method extracts a number of features from reverberant speech and builds a model using a recurrent neural network to estimate the reverberant acoustic parameters. We explore three configurations by including different data and also by combining the recurrent neural network estimates using a support vector machine. Our best method to estimate DRR provides a Root Mean Square Deviation (RMSD) of 3.84 dB and a RMSD of 43.19 % for T60 estimation.