Ascensión Gallardo-Antolín

h-index19

5papers

83citations

Novelty46%

AI Score37

Ranked #94,706 of 194,257 authors (top 49%)#503 in AS (top 35%)

5 Papers

5.0CVApr 17

Multilevel neural networks with dual-stage feature fusion for human activity recognition

Abeer FathAllah Brery, Ascensión Gallardo-Antolín, Israel Gonzalez-Carrasco et al.

Human activity recognition (HAR) refers to the process of identifying human actions and activities using data collected from sensors. Neural networks, such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, convolutional LSTM, and their hybrid combinations, have demonstrated exceptional performance in various research domains. Developing a multilevel individual or hybrid model for HAR involves strategically integrating multiple networks to capitalize on their complementary strengths. The structural arrangement of these components is a critical factor influencing the overall performance. This study explores a novel framework of a two-level network architecture with dual-stage feature fusion: late fusion, which combines the outputs from the first network level, and intermediate fusion, which integrates the features from both the first and second levels. We evaluated $15$ different network architectures of CNNs, LSTMs, and convolutional LSTMs, incorporating late fusion with and without intermediate fusion, to identify the optimal configuration. Experimental evaluation on two public benchmark datasets demonstrates that architectures incorporating both late and intermediate fusion achieve higher accuracy than those relying on late fusion alone. Moreover, the optimal configuration outperforms baseline models, thereby validating its effectiveness for HAR.

2.1MLSep 7, 2022

Bayesian learning of feature spaces for multitasks problems

Carlos Sevilla-Salcedo, Ascensión Gallardo-Antolín, Vanessa Gómez-Verdejo et al.

This paper introduces a novel approach for multi-task regression that connects Kernel Machines (KMs) and Extreme Learning Machines (ELMs) through the exploitation of the Random Fourier Features (RFFs) approximation of the RBF kernel. In this sense, one of the contributions of this paper shows that for the proposed models, the KM and the ELM formulations can be regarded as two sides of the same coin. These proposed models, termed RFF-BLR, stand on a Bayesian framework that simultaneously addresses two main design goals. On the one hand, it fits multitask regressors based on KMs endowed with RBF kernels. On the other hand, it enables the introduction of a common-across-tasks prior that promotes multioutput sparsity in the ELM view. This Bayesian approach facilitates the simultaneous consideration of both the KM and ELM perspectives enabling (i) the optimisation of the RBF kernel parameter $γ$ within a probabilistic framework, (ii) the optimisation of the model complexity, and (iii) an efficient transfer of knowledge across tasks. The experimental results show that this framework can lead to significant performance improvements compared to the state-of-the-art methods in multitask nonlinear regression.

5.9ASFeb 5, 2024

An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility

Miguel Fernández-Díaz, Ascensión Gallardo-Antolín

Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.

1.2ASFeb 5, 2024

On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification

Ascensión Gallardo-Antolín, Juan M. Montero

Speech intelligibility can be affected by multiple factors, such as noisy environments, channel distortions or physiological issues. In this work, we deal with the problem of automatic prediction of the speech intelligibility level in this latter case. Starting from our previous work, a non-intrusive system based on LSTM networks with attention mechanism designed for this task, we present two main contributions. In the first one, it is proposed the use of per-frame modulation spectrograms as input features, instead of compact representations derived from them that discard important temporal information. In the second one, two different strategies for the combination of per-frame acoustic log-mel and modulation spectrograms into the LSTM framework are explored: at decision level or late fusion and at utterance level or Weighted-Pooling (WP) fusion. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. On the one hand, results show that attentional LSTM networks are able to adequately modeling the modulation spectrograms sequences producing similar classification rates as in the case of log-mel spectrograms. On the other hand, both combination strategies, late and WP fusion, outperform the single-feature systems, suggesting that per-frame log-mel and modulation spectrograms carry complementary information for the task of speech intelligibility prediction, than can be effectively exploited by the LSTM-based architectures, being the system with the WP fusion strategy and Attention-Pooling the one that achieves best results.

2.3ASJan 26, 2024

Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure Classifiers

Kerlos Atia Abdalmalak, Ascensión Gallardo-Antol'in

Speaker Verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients (MFCC), their first and second derivatives (Deltas and Delta- Deltas), Bark Frequency Cepstral Coefficients (BFCC), Perceptual Linear Predictive (PLP), and Relative Spectral Transform - Perceptual Linear Predictive (RASTA-PLP). In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional Support Vector Machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: Linear kernel and Gaussian Radial Basis Function (RBF) kernel with a Logistic Regression (LR) classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.