Alessio Benavoli

LG
h-index34
24papers
1,254citations
Novelty45%
AI Score43

24 Papers

OCApr 12, 2016
A probabilistic interpretation of set-membership filtering: application to polynomial systems through polytopic bounding

Alessio Benavoli, Dario Piga

Set-membership estimation is usually formulated in the context of set-valued calculus and no probabilistic calculations are necessary. In this paper, we show that set-membership estimation can be equivalently formulated in the probabilistic setting by employing sets of probability measures. Inference in set-membership estimation is thus carried out by computing expectations with respect to the updated set of probability measures P as in the probabilistic case. In particular, it is shown that inference can be performed by solving a particular semi-infinite linear programming problem, which is a special case of the truncated moment problem in which only the zero-th order moment is known (i.e., the support). By writing the dual of the above semi-infinite linear programming problem, it is shown that, if the nonlinearities in the measurement and process equations are polynomial and if the bounding sets for initial state, process and measurement noises are described by polynomial inequalities, then an approximation of this semi-infinite linear programming problem can efficiently be obtained by using the theory of sum-of-squares polynomial optimization. We then derive a smart greedy procedure to compute a polytopic outer-approximation of the true membership-set, by computing the minimum-volume polytope that outer-bounds the set that includes all the means computed with respect to P.

AIAug 4, 2022
Credal Valuation Networks for Machine Reasoning Under Uncertainty

Branko Ristic, Alessio Benavoli, Sanjeev Arulampalam

Contemporary undertakings provide limitless opportunities for widespread application of machine reasoning and artificial intelligence in situations characterised by uncertainty, hostility and sheer volume of data. The paper develops a valuation network as a graphical system for higher-level fusion and reasoning under uncertainty in support of the human operators. Valuations, which are mathematical representation of (uncertain) knowledge and collected data, are expressed as credal sets, defined as coherent interval probabilities in the framework of imprecise probability theory. The basic operations with such credal sets, combination and marginalisation, are defined to satisfy the axioms of a valuation algebra. A practical implementation of the credal valuation network is discussed and its utility demonstrated on a small scale example.

LGFeb 1, 2023
Learning Choice Functions with Gaussian Processes

Alessio Benavoli, Dario Azzimonti, Dario Piga

In consumer theory, ranking available objects by means of preference relations yields the most common description of individual choices. However, preference-based models assume that individuals: (1) give their preferences only between pairs of objects; (2) are always able to pick the best preferred object. In many situations, they may be instead choosing out of a set with more than two elements and, because of lack of information and/or incomparability (objects with contradictory characteristics), they may not able to select a single most preferred object. To address these situations, we need a choice-model which allows an individual to express a set-valued choice. Choice functions provide such a mathematical framework. We propose a Gaussian Process model to learn choice functions from choice-data. The proposed model assumes a multiple utility representation of a choice function based on the concept of Pareto rationalization, and derives a strategy to learn both the number and the values of these latent multiple utilities. Simulation experiments demonstrate that the proposed model outperforms the state-of-the-art methods.

AIDec 29, 2025
Why AI Safety Requires Uncertainty, Incomplete Preferences, and Non-Archimedean Utilities

Alessio Benavoli, Alessandro Facchini, Marco Zaffalon

How can we ensure that AI systems are aligned with human values and remain safe? We can study this problem through the frameworks of the AI assistance and the AI shutdown games. The AI assistance problem concerns designing an AI agent that helps a human to maximise their utility function(s). However, only the human knows these function(s); the AI assistant must learn them. The shutdown problem instead concerns designing AI agents that: shut down when a shutdown button is pressed; neither try to prevent nor cause the pressing of the shutdown button; and otherwise accomplish their task competently. In this paper, we show that addressing these challenges requires AI agents that can reason under uncertainty and handle both incomplete and non-Archimedean preferences.

LGJan 14
Constraint- and Score-Based Nonlinear Granger Causality Discovery with Kernels

Fiona Murphy, Alessio Benavoli

Kernel-based methods are used in the context of Granger Causality to enable the identification of nonlinear causal relationships between time series variables. In this paper, we show that two state of the art kernel-based Granger Causality (GC) approaches can be theoretically unified under the framework of Kernel Principal Component Regression (KPCR), and introduce a method based on this unification, demonstrating that this approach can improve causal identification. Additionally, we introduce a Gaussian Process score-based model with Smooth Information Criterion penalisation on the marginal likelihood, and demonstrate improved performance over existing state of the art time-series nonlinear causal discovery methods. Furthermore, we propose a contemporaneous causal identification algorithm, fully based on GC, using the proposed score-based $GP_{SIC}$ method, and compare its performance to a state of the art contemporaneous time series causal discovery algorithm.

LGMar 18, 2024
A tutorial on learning from preferences and choices with Gaussian Processes

Alessio Benavoli, Dario Azzimonti

Preference modelling lies at the intersection of economics, decision theory, machine learning and statistics. By understanding individuals' preferences and how they make choices, we can build products that closely match their expectations, paving the way for more efficient and personalised applications across a wide range of domains. The objective of this tutorial is to present a cohesive and comprehensive framework for preference learning with Gaussian Processes (GPs), demonstrating how to seamlessly incorporate rationality principles (from economics and decision theory) into the learning process. By suitably tailoring the likelihood function, this framework enables the construction of preference learning models that encompass random utility models, limits of discernment, and scenarios with multiple conflicting utilities for both object- and label-preference. This tutorial builds upon established research while simultaneously introducing some novel GP-based models to address specific gaps in the existing literature.

MLFeb 8, 2025
dynoGP: Deep Gaussian Processes for dynamic system identification

Alessio Benavoli, Dario Piga, Marco Forgione et al.

In this work, we present a novel approach to system identification for dynamical systems, based on a specific class of Deep Gaussian Processes (Deep GPs). These models are constructed by interconnecting linear dynamic GPs (equivalent to stochastic linear time-invariant dynamical systems) and static GPs (to model static nonlinearities). Our approach combines the strengths of data-driven methods, such as those based on neural network architectures, with the ability to output a probability distribution. This offers a more comprehensive framework for system identification that includes uncertainty quantification. Using both simulated and real-world data, we demonstrate the effectiveness of the proposed approach.

LGFeb 10, 2025
The AI off-switch problem as a signalling game: bounded rationality and incomparability

Alessio Benavoli, Alessandro Facchini, Marco Zaffalon

The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.

MLDec 17, 2021
Correlated Product of Experts for Sparse Gaussian Process Regression

Manuel Schürch, Dario Azzimonti, Alessio Benavoli et al.

Gaussian processes (GPs) are an important tool in machine learning and statistics with applications ranging from social and natural science through engineering. They constitute a powerful kernelized non-parametric method with well-calibrated uncertainty estimates, however, off-the-shelf GP inference procedures are limited to datasets with several thousand data points because of their cubic computational complexity. For this reason, many sparse GPs techniques have been developed over the past years. In this paper, we focus on GP regression tasks and propose a new approach based on aggregating predictions from several local and correlated experts. Thereby, the degree of correlation between the experts can vary between independent up to fully correlated experts. The individual predictions of the experts are aggregated taking into account their correlation resulting in consistent uncertainty estimates. Our method recovers independent Product of Experts, sparse GP and full GP in the limiting cases. The presented framework can deal with a general kernel function and multiple variables, and has a time and space complexity which is linear in the number of experts and data samples, which makes our approach highly scalable. We demonstrate superior performance, in a time vs. accuracy sense, of our proposed method against state-of-the-art GP approximation methods for synthetic as well as several real-world datasets with deterministic and stochastic optimization.

MLOct 15, 2021
Choice functions based multi-objective Bayesian optimisation

Alessio Benavoli, Dario Azzimonti, Dario Piga

In this work we introduce a new framework for multi-objective Bayesian optimisation where the multi-objective functions can only be accessed via choice judgements, such as ``I pick options A,B,C among this set of five options A,B,C,D,E''. The fact that the option D is rejected means that there is at least one option among the selected ones A,B,C that I strictly prefer over D (but I do not have to specify which one). We assume that there is a latent vector function f for some dimension $n_e$ which embeds the options into the real vector space of dimension n, so that the choice set can be represented through a Pareto set of non-dominated options. By placing a Gaussian process prior on f and deriving a novel likelihood model for choice data, we propose a Bayesian framework for choice functions learning. We then apply this surrogate model to solve a novel multi-objective Bayesian optimisation from choice data problem.

MLSep 28, 2021
Gaussian Processes to speed up MCMC with automatic exploratory-exploitation effect

Alessio Benavoli, Jason Wyse, Arthur White

We present a two-stage Metropolis-Hastings algorithm for sampling probabilistic models, whose log-likelihood is computationally expensive to evaluate, by using a surrogate Gaussian Process (GP) model. The key feature of the approach, and the difference w.r.t. previous works, is the ability to learn the target distribution from scratch (while sampling), and so without the need of pre-training the GP. This is fundamental for automatic and inference in Probabilistic Programming Languages In particular, we present an alternative first stage acceptance scheme by marginalising out the GP distributed function, which makes the acceptance ratio explicitly dependent on the variance of the GP. This approach is extended to Metropolis-Adjusted Langevin algorithm (MALA).

LGJul 27, 2021
Bayesian Optimisation for Sequential Experimental Design with Applications in Additive Manufacturing

Mimi Zhang, Andrew Parnell, Dermot Brabazon et al.

Bayesian optimization (BO) is an approach to globally optimizing black-box objective functions that are expensive to evaluate. BO-powered experimental design has found wide application in materials science, chemistry, experimental physics, drug development, etc. This work aims to bring attention to the benefits of applying BO in designing experiments and to provide a BO manual, covering both methodology and software, for the convenience of anyone who wants to apply or learn BO. In particular, we briefly explain the BO technique, review all the applications of BO in additive manufacturing, compare and exemplify the features of different open BO libraries, unlock new potential applications of BO to other types of data (e.g., preferential output). This article is aimed at readers with some understanding of Bayesian methods, but not necessarily with knowledge of additive manufacturing; the software performance overview and implementation instructions are instrumental for any experimental-design practitioner. Moreover, our review in the field of additive manufacturing highlights the current knowledge and technological trends of BO. This article has a supplementary material online.

MLMay 9, 2021
Bayesian Kernelised Test of (In)dependence with Mixed-type Variables

Alessio Benavoli, Cassio de Campos

A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound). We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model. The new measure of (in)dependence allows us to answer some fundamental questions: Based on data, are (mixed-type) variables independent? How likely is dependence/independence to hold? How high is the probability that two mixed-type variables are more than just weakly dependent? We theoretically show the properties of the approach, as well as algorithms for fast computation with it. We empirically demonstrate the effectiveness of the proposed method by analysing its performance and by comparing it with other frequentist and Bayesian approaches on a range of datasets and tasks with mixed-type variables.

MLDec 12, 2020
A unified framework for closed-form nonparametric regression, classification, preference and mixed problems with Skew Gaussian Processes

Alessio Benavoli, Dario Azzimonti, Dario Piga

Skew-Gaussian processes (SkewGPs) extend the multivariate Unified Skew-Normal distributions over finite dimensional vectors to distribution over functions. SkewGPs are more general and flexible than Gaussian processes, as SkewGPs may also represent asymmetric distributions. In a recent contribution we showed that SkewGP and probit likelihood are conjugate, which allows us to compute the exact posterior for non-parametric binary classification and preference learning. In this paper, we generalize previous results and we prove that SkewGP is conjugate with both the normal and affine probit likelihood, and more in general, with their product. This allows us to (i) handle classification, preference, numeric and ordinal regression, and mixed problems in a unified framework; (ii) derive closed-form expression for the corresponding posterior distributions. We show empirically that the proposed framework based on SkewGP provides better performance than Gaussian processes in active learning and Bayesian (constrained) optimization. These two tasks are fundamental for design of experiments and in Data Science.

MLSep 17, 2020
Time series forecasting with Gaussian Processes needs priors

Giorgio Corani, Alessio Benavoli, Marco Zaffalon

Automatic forecasting is the task of receiving a time series and returning a forecast for the next time steps without any human intervention. Gaussian Processes (GPs) are a powerful tool for modeling time series, but so far there are no competitive approaches for automatic forecasting based on GPs. We propose practical solutions to two problems: automatic selection of the optimal kernel and reliable estimation of the hyperparameters. We propose a fixed composition of kernels, which contains the components needed to model most time series: linear trend, periodic patterns, and other flexible kernel for modeling the non-linear trend. Not all components are necessary to model each time series; during training the unnecessary components are automatically made irrelevant via automatic relevance determination (ARD). We moreover assign priors to the hyperparameters, in order to keep the inference within a plausible range; we design such priors through an empirical Bayes approach. We present results on many time series of different types; our GP model is more accurate than state-of-the-art time series models. Thanks to the priors, a single restart is enough the estimate the hyperparameters; hence the model is also fast to train.

LGAug 15, 2020
Preferential Bayesian optimisation with Skew Gaussian Processes

Alessio Benavoli, Dario Azzimonti, Dario Piga

Preferential Bayesian optimisation (PBO) deals with optimisation problems where the objective function can only be accessed via preference judgments, such as "this is better than that" between two candidate solutions (like in A/B tests or recommender systems). The state-of-the-art approach to PBO uses a Gaussian process to model the preference function and a Bernoulli likelihood to model the observed pairwise comparisons. Laplace's method is then employed to compute posterior inferences and, in particular, to build an appropriate acquisition function. In this paper, we prove that the true posterior distribution of the preference function is a Skew Gaussian Process (SkewGP), with highly skewed pairwise marginals and, thus, show that Laplace's method usually provides a very poor approximation. We then derive an efficient method to compute the exact SkewGP posterior and use it as surrogate model for PBO employing standard acquisition functions (Upper Credible Bound, etc.). We illustrate the benefits of our exact PBO-SkewGP in a variety of experiments, by showing that it consistently outperforms PBO based on Laplace's approximation both in terms of convergence speed and computational time. We also show that our framework can be extended to deal with mixed preferential-categorical BO, where binary judgments (valid or non-valid) together with preference judgments are available.

MLJul 13, 2020
Orthogonally Decoupled Variational Fourier Features

Dario Azzimonti, Manuel Schürch, Alessio Benavoli et al.

Sparse inducing points have long been a standard method to fit Gaussian processes to big data. In the last few years, spectral methods that exploit approximations of the covariance kernel have shown to be competitive. In this work we exploit a recently introduced orthogonally decoupled variational basis to combine spectral methods and sparse inducing points methods. We show that the method is competitive with the state-of-the-art on synthetic and on real-world data.

LGMay 26, 2020
Skew Gaussian Processes for Classification

Alessio Benavoli, Dario Azzimonti, Dario Piga

Gaussian processes (GPs) are distributions over functions, which provide a Bayesian nonparametric approach to regression and classification. In spite of their success, GPs have limited use in some applications, for example, in some cases a symmetric distribution with respect to its mean is an unreasonable model. This implies, for instance, that the mean and the median coincide, while the mean and median in an asymmetric (skewed) distribution can be different numbers. In this paper, we propose Skew-Gaussian processes (SkewGPs) as a non-parametric prior over functions. A SkewGP extends the multivariate Unified Skew-Normal distribution over finite dimensional vectors to a stochastic processes. The SkewGP class of distributions includes GPs and, therefore, SkewGPs inherit all good properties of GPs and increase their flexibility by allowing asymmetry in the probabilistic model. By exploiting the fact that SkewGP and probit likelihood are conjugate model, we derive closed form expressions for the marginal likelihood and predictive distribution of this new nonparametric classifier. We verify empirically that the proposed SkewGP classifier provides a better performance than a GP classifier based on either Laplace's method or Expectation Propagation.

MLMay 28, 2019
Recursive Estimation for Sparse Gaussian Process Regression

Manuel Schürch, Dario Azzimonti, Alessio Benavoli et al.

Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their use is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. In this work we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman Filter like updating for online learning. The majority of previous work has focused on the batch setting, in particular for learning the model parameters and the position of the inducing points, here instead we focus on training with mini-batches. By exploiting the Kalman filter formulation, we propose a novel approach that estimates such parameters by recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, our method keeps analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.

LGSep 28, 2016
Statistical comparison of classifiers through Bayesian hierarchical modelling

Giorgio Corani, Alessio Benavoli, Janez Demšar et al.

Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst). Yet the nhst tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We propose a Bayesian hierarchical model which jointly analyzes the cross-validation results obtained by two classifiers on multiple data sets. It returns the posterior probability of the accuracies of the two classifiers being practically equivalent or significantly different. A further strength of the hierarchical model is that, by jointly analyzing the results obtained on all data sets, it reduces the estimation error compared to the usual approach of averaging the cross-validation results obtained on a given data set.

MLJun 14, 2016
Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis

Alessio Benavoli, Giorgio Corani, Janez Demsar et al.

The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it.

LGJan 7, 2016
State Space representation of non-stationary Gaussian Processes

Alessio Benavoli, Marco Zaffalon

The state space (SS) representation of Gaussian processes (GP) has recently gained a lot of interest. The main reason is that it allows to compute GPs based inferences in O(n), where $n$ is the number of observations. This implementation makes GPs suitable for Big Data. For this reason, it is important to provide a SS representation of the most important kernels used in machine learning. The aim of this paper is to show how to exploit the transient behaviour of SS models to map non-stationary kernels to SS models.

LGMay 9, 2015
Should we really use post-hoc tests based on mean-ranks?

Alessio Benavoli, Giorgio Corani, Francesca Mangili

The statistical comparison of multiple algorithms over multiple data sets is fundamental in machine learning. This is typically carried out by the Friedman test. When the Friedman test rejects the null hypothesis, multiple comparisons are carried out to establish which are the significant differences among algorithms. The multiple comparisons are usually performed using the mean-ranks test. The aim of this technical note is to discuss the inconsistencies of the mean-ranks post-hoc test with the goal of discouraging its use in machine learning as well as in medicine, psychology, etc.. We show that the outcome of the mean-ranks test depends on the pool of algorithms originally included in the experiment. In other words, the outcome of the comparison between algorithms A and B depends also on the performance of the other algorithms included in the original experiment. This can lead to paradoxical situations. For instance the difference between A and B could be declared significant if the pool comprises algorithms C, D, E and not significant if the pool comprises algorithms F, G, H. To overcome these issues, we suggest instead to perform the multiple comparison using a test whose outcome only depends on the two algorithms being compared, such as the sign-test or the Wilcoxon signed-rank test.

AISep 26, 2013
On the Complexity of Strong and Epistemic Credal Networks

Denis D. Maua, Cassio Polpo de Campos, Alessio Benavoli et al.

Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with ternary variables. We prove that under epistemic irrelevance the polynomial time complexity of inferences in credal trees is not likely to extend to more general models (e.g. singly connected networks). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding computational complexity.