Julián D. Arias-Londoño

LG
Semantic Scholar Profile
h-index38
7papers
52citations
Novelty41%
AI Score41

7 Papers

CVFeb 12
Calibrated Bayesian Deep Learning for Explainable Decision Support Systems Based on Medical Imaging

Hua Xu, Julián D. Arias-Londoño, Juan I. Godino-Llorente

In critical decision support systems based on medical imaging, the reliability of AI-assisted decision-making is as relevant as predictive accuracy. Although deep learning models have demonstrated significant accuracy, they frequently suffer from miscalibration, manifested as overconfidence in erroneous predictions. To facilitate clinical acceptance, it is imperative that models quantify uncertainty in a manner that correlates with prediction correctness, allowing clinicians to identify unreliable outputs for further review. In order to address this necessity, the present paper proposes a generalizable probabilistic optimization framework grounded in Bayesian deep learning. Specifically, a novel Confidence-Uncertainty Boundary Loss (CUB-Loss) is introduced that imposes penalties on high-certainty errors and low-certainty correct predictions, explicitly enforcing alignment between prediction correctness and uncertainty estimates. Complementing this training-time optimization, a Dual Temperature Scaling (DTS) strategy is devised for post-hoc calibration, further refining the posterior distribution to improve intuitive explainability. The proposed framework is validated on three distinct medical imaging tasks: automatic screening of pneumonia, diabetic retinopathy detection, and identification of skin lesions. Empirical results demonstrate that the proposed approach achieves consistent calibration improvements across diverse modalities, maintains robust performance in data-scarce scenarios, and remains effective on severely imbalanced datasets, underscoring its potential for real clinical deployment.

LGOct 9, 2022
An Instance Selection Algorithm for Big Data in High imbalanced datasets based on LSH

Germán E. Melo-Acosta, Freddy Duitama-Muñoz, Julián D. Arias-Londoño

Training of Machine Learning (ML) models in real contexts often deals with big data sets and high-class imbalance samples where the class of interest is unrepresented (minority class). Practical solutions using classical ML models address the problem of large data sets using parallel/distributed implementations of training algorithms, approximate model-based solutions, or applying instance selection (IS) algorithms to eliminate redundant information. However, the combined problem of big and high imbalanced datasets has been less addressed. This work proposes three new methods for IS to be able to deal with large and imbalanced data sets. The proposed methods use Locality Sensitive Hashing (LSH) as a base clustering technique, and then three different sampling methods are applied on top of the clusters (or buckets) generated by LSH. The algorithms were developed in the Apache Spark framework, guaranteeing their scalability. The experiments carried out in three different datasets suggest that the proposed IS methods can improve the performance of a base ML model between 5% and 19% in terms of the geometric mean.

AISep 19, 2024
Swine Diet Design using Multi-objective Regionalized Bayesian Optimization

Gabriel D. Uribe-Guerra, Danny A. Múnera-Ramírez, Julián D. Arias-Londoño

The design of food diets in the context of animal nutrition is a complex problem that aims to develop cost-effective formulations while balancing minimum nutritional content. Traditional approaches based on theoretical models of metabolic responses and concentrations of digestible energy in raw materials face limitations in incorporating zootechnical or environmental variables affecting the performance of animals and including multiple objectives aligned with sustainable development policies. Recently, multi-objective Bayesian optimization has been proposed as a promising heuristic alternative able to deal with the combination of multiple sources of information, multiple and diverse objectives, and with an intrinsic capacity to deal with uncertainty in the measurements that could be related to variability in the nutritional content of raw materials. However, Bayesian optimization encounters difficulties in high-dimensional search spaces, leading to exploration predominantly at the boundaries. This work analyses a strategy to split the search space into regions that provide local candidates termed multi-objective regionalized Bayesian optimization as an alternative to improve the quality of the Pareto set and Pareto front approximation provided by BO in the context of swine diet design. Results indicate that this regionalized approach produces more diverse non-dominated solutions compared to the standard multi-objective Bayesian optimization. Besides, the regionalized strategy was four times more effective in finding solutions that outperform those identified by a stochastic programming approach referenced in the literature. Experiments using batches of query candidate solutions per iteration show that the optimization process can also be accelerated without compromising the quality of the Pareto set approximation during the initial, most critical phase of optimization.

ASMar 4, 2024
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

Janaína Mendes-Laureano, Jorge A. Gómez-García, Alejandro Guerrero-López et al.

The screening of Parkinson's Disease (PD) through speech is hindered by a notable lack of publicly available datasets in different languages. This fact limits the reproducibility and further exploration of existing research. To address this gap, this manuscript presents the NeuroVoz corpus consisting of 112 native Castilian-Spanish speakers, including 58 healthy controls and 54 individuals with PD, all recorded in ON state. The corpus showcases a diverse array of speech tasks: sustained vowels; diadochokinetic tests; 16 Listen-and-Repeat utterances; and spontaneous monologues. The dataset is also complemented with subjective assessments of voice quality performed by an expert according to the GRBAS scale (Grade/Roughness/Breathiness/Asthenia/Strain), as well as annotations with a thorough examination of phonation quality, intensity, speed, resonance, intelligibility, and prosody. The corpus offers a substantial resource for the exploration of the impact of PD on speech. This data set has already supported several studies, achieving a benchmark accuracy of 89% for the screening of PD. Despite these advances, the broader challenge of conducting a language-agnostic, cross-corpora analysis of Parkinsonian speech patterns remains open.

NCSep 1, 2025
Automatic Screening of Parkinson's Disease from Visual Explorations

Maria F. Alcala-Durand, J. Camilo Puerta-Acevedo, Julián D. Arias-Londoño et al.

Eye movements can reveal early signs of neurodegeneration, including those associated with Parkinson's Disease (PD). This work investigates the utility of a set of gaze-based features for the automatic screening of PD from different visual exploration tasks. For this purpose, a novel methodology is introduced, combining classic fixation/saccade oculomotor features (e.g., saccade count, fixation duration, scanned area) with features derived from gaze clusters (i.e., regions with a considerable accumulation of fixations). These features are automatically extracted from six exploration tests and evaluated using different machine learning classifiers. A Mixture of Experts ensemble is used to integrate outputs across tests and both eyes. Results show that ensemble models outperform individual classifiers, achieving an Area Under the Receiving Operating Characteristic Curve (AUC) of 0.95 on a held-out test set. The findings support visual exploration as a non-invasive tool for early automatic screening of PD.

LGMay 31, 2025
Imputation of Missing Data in Smooth Pursuit Eye Movements Using a Self-Attention-based Deep Learning Approach

Mehdi Bejani, Guillermo Perez-de-Arenaza-Pozo, Julián D. Arias-Londoño et al.

Missing data is a relevant issue in time series, especially in biomedical sequences such as those corresponding to smooth pursuit eye movements, which often contain gaps due to eye blinks and track losses, complicating the analysis and extraction of meaningful biomarkers. In this paper, a novel imputation framework is proposed using Self-Attention-based Imputation networks for time series, which leverages the power of deep learning and self-attention mechanisms to impute missing data. We further refine the imputed data using a custom made autoencoder, tailored to represent smooth pursuit eye movement sequences. The proposed approach was implemented using 5,504 sequences from 172 Parkinsonian patients and healthy controls. Results show a significant improvement in the accuracy of reconstructed eye movement sequences with respect to other state of the art techniques, substantially reducing the values for common time domain error metrics such as the mean absolute error, mean relative error, and root mean square error, while also preserving the signal's frequency domain characteristics. Moreover, it demonstrates robustness when large intervals of data are missing. This method offers an alternative solution for robustly handling missing data in time series, enhancing the reliability of smooth pursuit analysis for the screening and monitoring of neurodegenerative disorders.

CLOct 13, 2021
Fake News Detection in Spanish Using Deep Learning Techniques

Kevin Martínez-Gallego, Andrés M. Álvarez-Ortiz, Julián D. Arias-Londoño

This paper addresses the problem of fake news detection in Spanish using Machine Learning techniques. It is fundamentally the same problem tackled for the English language; however, there is not a significant amount of publicly available and adequately labeled fake news in Spanish to effectively train a Machine Learning model, similarly to those proposed for the English language. Therefore, this work explores different training strategies and architectures to establish a baseline for further research in this area. Four datasets were used, two in English and two in Spanish, and four experimental schemes were tested, including a baseline with classical Machine Learning models, trained and validated using a small dataset in Spanish. The remaining schemes include state-of-the-art Deep Learning models trained (or fine-tuned) and validated in English, trained and validated in Spanish, and fitted in English and validated with automatic translated Spanish sentences. The Deep Learning architectures were built on top of different pre-trained Word Embedding representations, including GloVe, ELMo, BERT, and BETO (a BERT version trained on a large corpus in Spanish). According to the results, the best strategy was a combination of a pre-trained BETO model and a Recurrent Neural Network based on LSTM layers, yielding an accuracy of up to 80%; nonetheless, a baseline model using a Random Forest estimator obtained similar outcomes. Additionally, the translation strategy did not yield acceptable results because of the propagation error; there was also observed a significant difference in models performance when trained in English or Spanish, mainly attributable to the number of samples available for each language.