Antonio Salmerón

LG
h-index27
6papers
47citations
Novelty16%
AI Score30

6 Papers

SENov 14, 2024Code
How do Machine Learning Models Change?

Joel Castaño, Rafael Cabañas, Antonio Salmerón et al.

The proliferation of Machine Learning (ML) models and their open-source implementations has transformed Artificial Intelligence research and applications. Platforms like Hugging Face (HF) enable this evolving ecosystem, yet a large-scale longitudinal study of how these models change is lacking. This study addresses this gap by analyzing over 680,000 commits from 100,000 models and 2,251 releases from 202 of these models on HF using repository mining and longitudinal methods. We apply an extended ML change taxonomy to classify commits and use Bayesian networks to model temporal patterns in commit and release activities. Our findings show that commit activities align with established data science methodologies, such as the Cross-Industry Standard Process for Data Mining (CRISP-DM), emphasizing iterative refinement. Release patterns tend to consolidate significant updates, particularly in model outputs, sharing, and documentation, distinguishing them from granular commits. Furthermore, projects with higher popularity exhibit distinct evolutionary paths, often starting from a more mature baseline with fewer foundational commits in their public history. In contrast, those with intensive collaboration show unique documentation and technical evolution patterns. These insights enhance the understanding of model changes on community platforms and provide valuable guidance for best practices in model maintenance.

LGApr 4, 2017Code
AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning

Andrés R. Masegosa, Ana M. Martínez, Darío Ramos-López et al.

The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorithms for either streaming or batch data. These algorithms are based on a flexible variational message passing scheme, which supports discrete and continu- ous variables from a wide range of probability distributions. AMIDST also leverages existing functionality and algorithms by interfacing to software tools such as Flink, Spark, MOA, Weka, R and HUGIN. AMIDST is an open source toolbox written in Java and available at http://www.amidsttoolbox.com under the Apache Software License version 2.0.

AINov 17, 2025
Causal computations in Semi Markovian Structural Causal Models using divide and conquer

Anna Rodum Bjøru, Rafael Cabañas, Helge Langseth et al.

Recently, Bjøru et al. proposed a novel divide-and-conquer algorithm for bounding counterfactual probabilities in structural causal models (SCMs). They assumed that the SCMs were learned from purely observational data, leading to an imprecise characterization of the marginal distributions of exogenous variables. Their method leveraged the canonical representation of structural equations to decompose a general SCM with high-cardinality exogenous variables into a set of sub-models with low-cardinality exogenous variables. These sub-models had precise marginals over the exogenous variables and therefore admitted efficient exact inference. The aggregated results were used to bound counterfactual probabilities in the original model. The approach was developed for Markovian models, where each exogenous variable affects only a single endogenous variable. In this paper, we investigate extending the methodology to \textit{semi-Markovian} SCMs, where exogenous variables may influence multiple endogenous variables. Such models are capable of representing confounding relationships that Markovian models cannot. We illustrate the challenges of this extension using a minimal example, which motivates a set of alternative solution strategies. These strategies are evaluated both theoretically and through a computational study.

AIJan 18, 2024
Bayesian Networks for Causal Analysis in Socioecological Systems

Rafael Cabañas, Ana D. Maldonado, María Morales et al.

Causal and counterfactual reasoning are emerging directions in data science that allow us to reason about hypothetical scenarios. This is particularly useful in fields like environmental and ecological sciences, where interventional data are usually not available. Structural causal models are probabilistic models for causal analysis that simplify this kind of reasoning due to their graphical representation. They can be regarded as extensions of the so-called Bayesian networks, a well known modeling tool commonly used in environmental and ecological problems. The main contribution of this paper is to analyze the relations of necessity and sufficiency between the variables of a socioecological system using counterfactual reasoning with Bayesian networks. In particular, we consider a case study involving socioeconomic factors and land-uses in southern Spain. In addition, this paper aims to be a coherent overview of the fundamental concepts for applying counterfactual reasoning, so that environmental researchers with a background in Bayesian networks can easily take advantage of the structural causal model formalism.

LGAug 29, 2019
InferPy: Probabilistic Modeling with Deep Neural Networks Made Easy

Javier Cózar, Rafael Cabañas, Antonio Salmerón et al.

InferPy is a Python package for probabilistic modeling with deep neural networks. It defines a user-friendly API that trades-off model complexity with ease of use, unlike other libraries whose focus is on dealing with very general probabilistic models at the cost of having a more complex API. In particular, this package allows to define, learn and evaluate general hierarchical probabilistic models containing deep neural networks in a compact and simple way. InferPy is built on top of Tensorflow Probability and Keras.

LGAug 9, 2019
Probabilistic Models with Deep Neural Networks

Andrés R. Masegosa, Rafael Cabañas, Helge Langseth et al.

Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inference, a general form of approximate probabilistic inference originated in statistical physics, are allowing probabilistic modeling to overcome these restrictions: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computation engines allow to apply probabilistic modeling over massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within a probabilistic model to capture complex non-linear stochastic relationships between random variables. These advances in conjunction with the release of novel probabilistic modeling toolboxes have greatly expanded the scope of application of probabilistic models, and allow these models to take advantage of the recent strides made by the deep learning community. In this paper we review the main concepts, methods and tools needed to use deep neural networks within a probabilistic modeling framework.