Mauricio A. Álvarez

h-index21

41papers

2,528citations

Novelty48%

AI Score45

Ranked #41,356 of 194,257 authors (top 21%)#525 in ML (top 16%)

41 Papers

11.8IVMar 29, 2022Code

Angular Super-Resolution in Diffusion MRI with a 3D Recurrent Convolutional Autoencoder

Matthew Lyon, Paul Armitage, Mauricio A. Álvarez

High resolution diffusion MRI (dMRI) data is often constrained by limited scanning time in clinical settings, thus restricting the use of downstream analysis techniques that would otherwise be available. In this work we develop a 3D recurrent convolutional neural network (RCNN) capable of super-resolving dMRI volumes in the angular (q-space) domain. Our approach formulates the task of angular super-resolution as a patch-wise regression using a 3D autoencoder conditioned on target b-vectors. Within the network we use a convolutional long short term memory (ConvLSTM) cell to model the relationship between q-space samples. We compare model performance against a baseline spherical harmonic interpolation and a 1D variant of the model architecture. We show that the 3D model has the lowest error rates across different subsampling schemes and b-values. The relative performance of the 3D RCNN is greatest in the very low angular resolution domain. Code for this project is available at https://github.com/m-lyon/dMRI-RCNN.

5.9MLOct 17, 2023

Thin and Deep Gaussian Processes

Daniel Augusto de Souza, Alexander Nikitin, ST John et al.

Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings. Unfortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability. This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations. We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.

1.2NAJun 15, 2018

Bayesian inversion of a diffusion evolution equation with application to Biology

Jean-Charles Croix, Nicolas Durrande, Mauricio Alvarez

A common task in experimental sciences is to fit mathematical models to real-world measurements to improve understanding of natural phenomenon (reverse-engineering or inverse modeling). When complex dynamical systems are considered, such as partial differential equations, this task may become challenging and ill-posed. In this work, a linear parabolic equation is considered where the objective is to estimate both the differential operator coefficients and the source term at once. The Bayesian methodology for inverse problems provides a form of regularization while quantifying uncertainty as the solution is a probability measure taking in account data. This posterior distribution, which is non-Gaussian and infinite dimensional, is then summarized through a mode and sampled using a state-of-the-art Markov-Chain Monte-Carlo algorithm based on a geometric approach. After a rigorous analysis, this methodology is applied on a dataset of the post-transcriptional regulation of Kni gap gene in the early development of Drosophila Melanogaster where mRNA concentration and both diffusion and depletion rates are inferred from noisy measurement of the protein concentration

3.8MLJun 17, 2022Code

Shallow and Deep Nonparametric Convolutions for Gaussian Processes

Thomas M. McDonald, Magnus Ross, Michael T. Smith et al.

A key challenge in the practical application of Gaussian processes (GPs) is selecting a proper covariance function. The moving average, or process convolutions, construction of GPs allows some additional flexibility, but still requires choosing a proper smoothing kernel, which is non-trivial. Previous approaches have built covariance functions by using GP priors over the smoothing kernel, and by extension the covariance, as a way to bypass the need to specify it in advance. However, such models have been limited in several ways: they are restricted to single dimensional inputs, e.g. time; they only allow modelling of single outputs and they do not scale to large datasets since inference is not straightforward. In this paper, we introduce a nonparametric process convolution formulation for GPs that alleviates these weaknesses by using a functional sampling approach based on Matheron's rule to perform fast sampling using interdomain inducing variables. Furthermore, we propose a composition of these nonparametric convolutions that serves as an alternative to classic deep GP models, and allows the covariance functions of the intermediate layers to be inferred from the data. We test the performance of our model on benchmarks for single output GPs, multiple output GPs and deep GPs and find that our approach can provide improvements over standard GP models, particularly for larger datasets.

4.6LGJul 2, 2024

Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference

Xiaoyu Jiang, Sokratia Georgaka, Magnus Rattray et al.

The Multi-Output Gaussian Process is is a popular tool for modelling data from multiple sources. A typical choice to build a covariance function for a MOGP is the Linear Model of Coregionalization (LMC) which parametrically models the covariance between outputs. The Latent Variable MOGP (LV-MOGP) generalises this idea by modelling the covariance between outputs using a kernel applied to latent variables, one per output, leading to a flexible MOGP model that allows efficient generalization to new outputs with few data points. Computational complexity in LV-MOGP grows linearly with the number of outputs, which makes it unsuitable for problems with a large number of outputs. In this paper, we propose a stochastic variational inference approach for the LV-MOGP that allows mini-batches for both inputs and outputs, making computational complexity per training iteration independent of the number of outputs.

4.6LGJul 1, 2024Code

Adaptive RKHS Fourier Features for Compositional Gaussian Process Models

Xinxing Shi, Thomas Baldwin-McDonald, Mauricio A. Álvarez

Deep Gaussian Processes (DGPs) leverage a compositional structure to model non-stationary processes. DGPs typically rely on local inducing point approximations across intermediate GP layers. Recent advances in DGP inference have shown that incorporating global Fourier features from the Reproducing Kernel Hilbert Space (RKHS) can enhance the DGPs' capability to capture complex non-stationary patterns. This paper extends the use of these features to compositional GPs involving linear transformations. In particular, we introduce Ordinary Differential Equation(ODE)--based RKHS Fourier features that allow for adaptive amplitude and phase modulation through convolution operations. This convolutional formulation relates our work to recently proposed deep latent force models, a multi-layer structure designed for modelling nonlinear dynamical systems. By embedding these adjustable RKHS Fourier features within a doubly stochastic variational inference framework, our model exhibits improved predictive performance across various regression tasks.

4.3MLNov 24, 2023

Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning

Thomas Baldwin-McDonald, Mauricio A. Álvarez

Modelling the behaviour of highly nonlinear dynamical systems with robust uncertainty quantification is a challenging task which typically requires approaches specifically designed to address the problem at hand. We introduce a domain-agnostic model to address this issue termed the deep latent force model (DLFM), a deep Gaussian process with physics-informed kernels at each layer, derived from ordinary differential equations using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world multi-output time series data. Additionally, we find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models on benchmark univariate regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.

6.6CVJun 18

Gaussian Process Prior Variational Autoencoder for Endoscopic Videos

Ivan De Boi, Xinxing Shi, Xiaoyu Jiang et al.

Endoscopic video analysis is essential for gastrointestinal diagnosis and computer-assisted interventions, but video sequences are routinely degraded by specular reflections, motion artifacts, and missing frames. These transient corruptions can distract clinicians, reduce image interpretability, and disrupt downstream tasks such as 3D reconstruction and navigation. Effective restoration therefore requires methods that exploit temporal continuity rather than treating frames in isolation. We introduce a Gaussian Process Prior Variational Autoencoder (GPVAE) framework for endoscopic video restoration that replaces the standard factorized latent prior with a temporal Gaussian process prior, enabling interpolation of missing frames with uncertainty-aware reconstruction. The framework combines endoscopy-specific encoders, including a convolutional EndoVAE backbone and pretrained Vision Transformer encoders from GastroNet-5M, with two scalable GP approximations: Hierarchical Prior Approximation (HPA) and Sparse Precision Approximation (SPA). Specular reflections are handled using a DUCKNet-based masking pipeline that excludes corrupted pixels from the reconstruction objective. On the C3VDv2 colonoscopy dataset, the best GPVAE variants reduced image reconstruction RMSE by 21.9\% on average, and by up to 26.1\%, relative to matched VAE baselines. Downstream trajectory RMSE was reduced by 12.7\% on average across classical visual odometry and a pretrained PoseNet, at an average increase of 27.3\% in training time per epoch. Finally, the GP posterior provides per-frame uncertainty estimates that reflect temporal support and offer a confidence signal for restored frames.

11.4LGApr 7, 2025

Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding

Haiping Liu, Lijing Lin, Jingyuan Sun et al.

Rotary Position Embedding (RoPE) is widely adopted in large language models (LLMs) due to its efficient encoding of relative positions with strong extrapolation capabilities. However, while its application in higher-dimensional input domains, such as 2D images, have been explored in several attempts, a unified theoretical framework is still lacking. To address this, we propose a systematic mathematical framework for RoPE grounded in Lie group and Lie algebra theory. We derive the necessary and sufficient conditions for any valid $N$-dimensional RoPE based on two core properties of RoPE - relativity and reversibility. We demonstrate that RoPE can be characterized as a basis of a maximal abelian subalgebra (MASA) in the special orthogonal Lie algebra, and that the commonly used axis-aligned block-diagonal RoPE, where each input axis is encoded by an independent 2x2 rotation block, corresponds to the maximal toral subalgebra. Furthermore, we reduce spatial inter-dimensional interactions to a change of basis, resolved by learning an orthogonal transformation. Our experiment results suggest that inter-dimensional interactions should be balanced with local structure preservation. Overall, our framework unifies and explains existing RoPE designs while enabling principled extensions to higher-dimensional modalities and tasks.

1.2GNDec 19, 2023Code

Longitudinal prediction of DNA methylation to forecast epigenetic outcomes

Arthur Leroy, Ai Ling Teh, Frank Dondelinger et al.

Interrogating the evolution of biological changes at early stages of life requires longitudinal profiling of molecules, such as DNA methylation, which can be challenging with children. We introduce a probabilistic and longitudinal machine learning framework based on multi-mean Gaussian processes (GPs), accounting for individual and gene correlations across time. This method provides future predictions of DNA methylation status at different individual ages while accounting for uncertainty. Our model is trained on a birth cohort of children with methylation profiled at ages 0-4, and we demonstrated that the status of methylation sites for each child can be accurately predicted at ages 5-7. We show that methylation profiles predicted by multi-mean GPs can be used to estimate other phenotypes, such as epigenetic age, and enable comparison to other health measures of interest. This approach encourages epigenetic studies to move towards longitudinal design for investigating epigenetic changes during development, ageing and disease progression.

1.2APOct 27, 2025

Towards Gaussian processes modelling to study the late effects of radiotherapy in children and young adults with brain tumours

Angela Davey, Arthur Leroy, Eliana Vasquez Osorio et al.

Survivors of childhood cancer need lifelong monitoring for side effects from radiotherapy. However, longitudinal data from routine monitoring is often infrequently and irregularly sampled, and subject to inaccuracies. Due to this, measurements are often studied in isolation, or simple relationships (e.g., linear) are used to impute missing timepoints. In this study, we investigated the potential role of Gaussian Processes (GP) modelling to make population-based and individual predictions, using insulin-like growth factor 1 (IGF-1) measurements as a test case. With training data of 23 patients with a median (range) of 4 (1-16) timepoints we identified a trend within the range of literature reported values. In addition, with 8 test cases, individual predictions were made with an average root mean squared error of 31.9 (10.1 - 62.3) ng/ml and 27.4 (0.02 - 66.1) ng/ml for two approaches. GP modelling may overcome limitations of routine longitudinal data and facilitate analysis of late effects of radiotherapy.

4.1LGOct 6, 2025

Counterfactual Credit Guided Bayesian Optimization

Qiyu Wei, Haowei Wang, Richard Allmendinger et al.

Bayesian optimization has emerged as a prominent methodology for optimizing expensive black-box functions by leveraging Gaussian process surrogates, which focus on capturing the global characteristics of the objective function. However, in numerous practical scenarios, the primary objective is not to construct an exhaustive global surrogate, but rather to quickly pinpoint the global optimum. Due to the aleatoric nature of the sequential optimization problem and its dependence on the quality of the surrogate model and the initial design, it is restrictive to assume that all observed samples contribute equally to the discovery of the optimum in this context. In this paper, we introduce Counterfactual Credit Guided Bayesian Optimization (CCGBO), a novel framework that explicitly quantifies the contribution of individual historical observations through counterfactual credit. By incorporating counterfactual credit into the acquisition function, our approach can selectively allocate resources in areas where optimal solutions are most likely to occur. We prove that CCGBO retains sublinear regret. Empirical evaluations on various synthetic and real-world benchmarks demonstrate that CCGBO consistently reduces simple regret and accelerates convergence to the global optimum.

11.4LGMay 22, 2025Code

Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling

Xinxing Shi, Xiaoyu Jiang, Mauricio A. Álvarez

Gaussian Process (GP) Variational Autoencoders (VAEs) extend standard VAEs by replacing the fully factorised Gaussian prior with a GP prior, thereby capturing richer correlations among latent variables. However, performing exact GP inference in large-scale GPVAEs is computationally prohibitive, often forcing existing approaches to rely on restrictive kernel assumptions or large sets of inducing points. In this work, we propose a neighbour-driven approximation strategy that exploits local adjacencies in the latent space to achieve scalable GPVAE inference. By confining computations to the nearest neighbours of each data point, our method preserves essential latent dependencies, allowing more flexible kernel choices and mitigating the need for numerous inducing points. Through extensive experiments on tasks including representation learning, data imputation, and conditional generation, we demonstrate that our approach outperforms other GPVAE variants in both predictive performance and computational efficiency.

5.3MLFeb 9, 2022

Adjoint-aided inference of Gaussian process driven differential equations

Paterne Gahungu, Christopher W Lanyon, Mauricio A Alvarez et al.

Linear systems occur throughout engineering and the sciences, most notably as differential equations. In many cases the forcing function for the system is unknown, and interest lies in using noisy observations of the system to infer the forcing, as well as other unknown parameters. In differential equations, the forcing function is an unknown function of the independent variables (typically time and space), and can be modelled as a Gaussian process (GP). In this paper we show how the adjoint of a linear system can be used to efficiently infer forcing functions modelled as GPs, using a truncated basis expansion of the GP kernel. We show how exact conjugate Bayesian inference for the truncated GP can be achieved, in many cases with substantially lower computation than would be required using MCMC methods. We demonstrate the approach on systems of both ordinary and partial differential equations, and show that the basis expansion approach approximates well the true forcing with a modest number of basis vectors. Finally, we show how to infer point estimates for the non-linear model parameters, such as the kernel length-scales, using Bayesian optimisation.

1.9MLOct 26, 2021Code

Modular Gaussian Processes for Transfer Learning

Pablo Moreno-Muñoz, Antonio Artés-Rodríguez, Mauricio A. Álvarez

We present a framework for transfer learning based on modular variational Gaussian processes (GP). We develop a module-based method that having a dictionary of well fitted GPs, one could build ensemble GP models without revisiting any data. Each model is characterised by its hyperparameters, pseudo-inputs and their corresponding posterior densities. Our method avoids undesired data centralisation, reduces rising computational costs and allows the transfer of learned uncertainty metrics after training. We exploit the augmentation of high-dimensional integral operators based on the Kullback-Leibler divergence between stochastic processes to introduce an efficient lower bound under all the sparse variational GPs, with different complexity and even likelihood distribution. The method is also valid for multi-output GPs, learning correlations a posteriori between independent modules. Extensive results illustrate the usability of our framework in large-scale and multi-task experiments, also compared with the exact inference methods in the literature.

8.4MLJun 10, 2021Code

Compositional Modeling of Nonlinear Dynamical Systems with ODE-based Random Features

Thomas M. McDonald, Mauricio A. Álvarez

Effectively modeling phenomena present in highly nonlinear dynamical systems whilst also accurately quantifying uncertainty is a challenging task, which often requires problem-specific techniques. We present a novel, domain-agnostic approach to tackling this problem, using compositions of physics-informed random features, derived from ordinary differential equations. The architecture of our model leverages recent advances in approximate inference for deep Gaussian processes, such as layer-wise weight-space approximations which allow us to incorporate random Fourier features, and stochastic variational inference for approximate Bayesian inference. We provide evidence that our model is capable of capturing highly nonlinear behaviour in real-world multivariate time series data. In addition, we find that our approach achieves comparable performance to a number of other probabilistic models on benchmark regression tasks.

5.0MLJun 10, 2021Code

Learning Nonparametric Volterra Kernels with Gaussian Processes

Magnus Ross, Michael T. Smith, Mauricio A. Álvarez

This paper introduces a method for the nonparametric Bayesian learning of nonlinear operators, through the use of the Volterra series with kernels represented using Gaussian processes (GPs), which we term the nonparametric Volterra kernels model (NVKM). When the input function to the operator is unobserved and has a GP prior, the NVKM constitutes a powerful method for both single and multiple output regression, and can be viewed as a nonlinear and nonparametric latent force model. When the input function is observed, the NVKM can be used to perform Bayesian system identification. We use recent advances in efficient sampling of explicit functions from GPs to map process realisations through the Volterra series without resorting to numerical integration, allowing scalability through doubly stochastic variational inference, and avoiding the need for Gaussian approximations of the output processes. We demonstrate the performance of the model for both multiple output regression and system identification using standard benchmarks.

2.7MLOct 6, 2020Code

Recyclable Gaussian Processes

Pablo Moreno-Muñoz, Antonio Artés-Rodríguez, Mauricio A. Álvarez

We present a new framework for recycling independent variational approximations to Gaussian processes. The main contribution is the construction of variational ensembles given a dictionary of fitted Gaussian processes without revisiting any subset of observations. Our framework allows for regression, classification and heterogeneous tasks, i.e. mix of continuous and discrete variables over the same input domain. We exploit infinite-dimensional integral operators based on the Kullback-Leibler divergence between stochastic processes to re-combine arbitrary amounts of variational sparse approximations with different complexity, likelihood model and location of the pseudo-inputs. Extensive results illustrate the usability of our framework in large-scale distributed experiments, also compared with the exact inference models in the literature.

12.5MLSep 27, 2020

Multi-task Causal Learning with Gaussian Processes

Virginia Aglietti, Theodoros Damoulas, Mauricio Álvarez et al.

This paper studies the problem of learning the correlation structure of a set of intervention functions defined on the directed acyclic graph (DAG) of a causal model. This is useful when we are interested in jointly learning the causal effects of interventions on different subsets of variables in a DAG, which is common in field such as healthcare or operations research. We propose the first multi-task causal Gaussian process (GP) model, which we call DAG-GP, that allows for information sharing across continuous interventions and across experiments on different variables. DAG-GP accommodates different assumptions in terms of data availability and captures the correlation between functions lying in input spaces of different dimensionality via a well-defined integral operator. We give theoretical results detailing when and how the DAG-GP model can be formulated depending on the DAG. We test both the quality of its predictions and its calibrated uncertainties. Compared to single-task models, DAG-GP achieves the best fitting performance in a variety of real and synthetic settings. In addition, it helps to select optimal interventions faster than competing approaches when used within sequential decision making frameworks, like active learning or Bayesian optimization.

2.7LGNov 28, 2019

Machine Learning for a Low-cost Air Pollution Network

Michael T. Smith, Joel Ssematimba, Mauricio A. Alvarez et al.

Data collection in economically constrained countries often necessitates using approximate and biased measurements due to the low-cost of the sensors used. This leads to potentially invalid predictions and poor policies or decision making. This is especially an issue if methods from resource-rich regions are applied without handling these additional constraints. In this paper we show, through the use of an air pollution network example, how using probabilistic machine learning can mitigate some of the technical constraints. Specifically we experiment with modelling the calibration for individual sensors as either distributions or Gaussian processes over time, and discuss the wider issues around the decision process.

3.2MLNov 22, 2019Code

A Fully Natural Gradient Scheme for Improving Inference of the Heterogeneous Multi-Output Gaussian Process Model

Juan-José Giraldo, Mauricio A. Álvarez

A recent novel extension of multi-output Gaussian processes handles heterogeneous outputs assuming that each output has its own likelihood function. It uses a vector-valued Gaussian process prior to jointly model all likelihoods' parameters as latent functions drawn from a Gaussian process with a linear model of coregionalisation covariance. By means of an inducing points framework, the model is able to obtain tractable variational bounds amenable to stochastic variational inference. Nonetheless, the strong conditioning between the variational parameters and the hyper-parameters burdens the adaptive gradient optimisation methods used in the original approach. To overcome this issue we borrow ideas from variational optimisation introducing an exploratory distribution over the hyper-parameters, allowing inference together with the posterior's variational parameters through a fully natural gradient optimisation scheme. Furthermore, in this work we introduce an extension of the heterogeneous multi-output model, where its latent functions are drawn from convolution processes. We show that our optimisation scheme can achieve better local optima solutions with higher test performance rates than adaptive gradient methods, this for both the linear model of coregionalisation and the convolution processes model. We also show how to make the convolutional model scalable by means of stochastic variational inference and how to optimise it through a fully natural gradient scheme. We compare the performance of the different methods over toy and real databases.

7.7MLOct 31, 2019Code

Continual Multi-task Gaussian Processes

Pablo Moreno-Muñoz, Antonio Artés-Rodríguez, Mauricio A. Álvarez

We address the problem of continual learning in multi-task Gaussian process (GP) models for handling sequential input-output observations. Our approach extends the existing prior-posterior recursion of online Bayesian inference, i.e.\ past posterior discoveries become future prior beliefs, to the infinite functional space setting of GP. For a reason of scalability, we introduce variational inference together with an sparse approximation based on inducing inputs. As a consequence, we obtain tractable continual lower-bounds where two novel Kullback-Leibler (KL) divergences intervene in a natural way. The key technical property of our method is the recursive reconstruction of conditional GP priors conditioned on the variational parameters learned so far. To achieve this goal, we introduce a novel factorization of past variational distributions, where the predictive GP equation propagates the posterior uncertainty forward. We then demonstrate that it is possible to derive GP models over many types of sequential observations, either discrete or continuous and amenable to stochastic optimization. The continual inference approach is also applicable to scenarios where potential multi-channel or heterogeneous observations might appear. Extensive experiments demonstrate that the method is fully scalable, shows a reliable performance and is robust to uncertainty error propagation over a plenty of synthetic and real-world datasets.

16.3CRSep 19, 2019

Adversarial Vulnerability Bounds for Gaussian Process Classification

Michael Thomas Smith, Kathrin Grosse, Michael Backes et al.

Machine learning (ML) classification is increasingly used in safety-critical systems. Protecting ML classifiers from adversarial examples is crucial. We propose that the main threat is that of an attacker perturbing a confidently classified input to produce a confident misclassification. To protect against this we devise an adversarial bound (AB) for a Gaussian process classifier, that holds for the entire input domain, bounding the potential for any future adversarial method to cause such misclassification. This is a formal guarantee of robustness, not just an empirically derived result. We investigate how to configure the classifier to maximise the bound, including the use of a sparse approximation, leading to the method producing a practical, useful and provably robust classifier, which we test using a variety of datasets.

4.8LGSep 19, 2019

Differentially Private Regression and Classification with Sparse Gaussian Processes

Michael Thomas Smith, Mauricio A. Alvarez, Neil D. Lawrence

A continuing challenge for machine learning is providing methods to perform computation on data while ensuring the data remains private. In this paper we build on the provable privacy guarantees of differential privacy which has been combined with Gaussian processes through the previously published \emph{cloaking method}. In this paper we solve several shortcomings of this method, starting with the problem of predictions in regions with low data density. We experiment with the use of inducing points to provide a sparse approximation and show that these can provide robust differential privacy in outlier areas and at higher dimensions. We then look at classification, and modify the Laplace approximation approach to provide differentially private predictions. We then combine this with the sparse approximation and demonstrate the capability to perform classification in high dimensions. We finally explore the issue of hyperparameter selection and develop a method for their private selection. This paper and associated libraries provide a robust toolkit for combining differential privacy and GPs in a practical manner.

11.8MLJun 22, 2019Code

Multi-task Learning for Aggregated Data using Gaussian Processes

Fariba Yousefi, Michael Thomas Smith, Mauricio A. Álvarez

Aggregated data is commonplace in areas such as epidemiology and demography. For example, census data for a population is usually given as averages defined over time periods or spatial resolutions (cities, regions or countries). In this paper, we present a novel multi-task learning model based on Gaussian processes for joint learning of variables that have been aggregated at different input scales. Our model represents each task as the linear combination of the realizations of latent processes that are integrated at a different scale per task. We are then able to compute the cross-covariance between the different tasks either analytically or numerically. We also allow each task to have a potentially different likelihood model and provide a variational lower bound that can be optimised in a stochastic fashion making our model suitable for larger datasets. We show examples of the model in a synthetic example, a fertility dataset, and an air pollution prediction application.

9.4MLJun 21, 2019

Black-Box Inference for Non-Linear Latent Force Models

Wil O. C. Ward, Tom Ryder, Dennis Prangle et al.

Latent force models are systems whereby there is a mechanistic model describing the dynamics of the system state, with some unknown forcing term that is approximated with a Gaussian process. If such dynamics are non-linear, it can be difficult to estimate the posterior state and forcing term jointly, particularly when there are system parameters that also need estimating. This paper uses black-box variational inference to jointly estimate the posterior, designing a multivariate extension to local inverse autoregressive flows as a flexible approximater of the system. We compare estimates on systems where the posterior is known, demonstrating the effectiveness of the approximation, and apply to problems with non-linear dynamics, multi-output systems and models with non-Gaussian likelihoods.

2.7LGJan 7, 2019

Variational bridge constructs for approximate Gaussian process regression

Wil O C Ward, Mauricio A Álvarez

This paper introduces a method to approximate Gaussian process regression by representing the problem as a stochastic differential equation and using variational inference to approximate solutions. The approximations are compared with full GP regression and generated paths are demonstrated to be indistinguishable from GP samples. We show that the approach extends easily to non-linear dynamics and discuss extensions to which the approach can be easily applied.

5.1ASOct 30, 2018Code

Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain

Pablo A. Alvarado, Mauricio A. Álvarez, Dan Stowell

Gaussian process (GP) audio source separation is a time-domain approach that circumvents the inherent phase approximation issue of spectrogram based methods. Furthermore, through its kernel, GPs elegantly incorporate prior knowledge about the sources into the separation model. Despite these compelling advantages, the computational complexity of GP inference scales cubically with the number of audio samples. As a result, source separation GP models have been restricted to the analysis of short audio frames. We introduce an efficient application of GPs to time-domain audio source separation, without compromising performance. For this purpose, we used GP regression, together with spectral mixture kernels, and variational sparse GPs. We compared our method with LD-PSDTF (positive semi-definite tensor factorization), KL-NMF (Kullback-Leibler non-negative matrix factorization), and IS-NMF (Itakura-Saito NMF). Results show that the proposed method outperforms these techniques.

10.9MLOct 10, 2018

Non-linear process convolutions for multi-output Gaussian processes

Mauricio A. Álvarez, Wil O. C. Ward, Cristian Guarnizo

The paper introduces a non-linear version of the process convolution formalism for building covariance functions for multi-output Gaussian processes. The non-linearity is introduced via Volterra series, one series per each output. We provide closed-form expressions for the mean function and the covariance function of the approximated Gaussian process at the output of the Volterra series. The mean function and covariance function for the joint Gaussian process are derived using formulae for the product moments of Gaussian variables. We compare the performance of the non-linear model against the classical process convolution approach in one synthetic dataset and two real datasets.

4.9MLAug 29, 2018Code

Physically-Inspired Gaussian Process Models for Post-Transcriptional Regulation in Drosophila

Andrés F. López-Lopera, Nicolas Durrande, Mauricio A. Alvarez

The regulatory process of Drosophila is thoroughly studied for understanding a great variety of biological principles. While pattern-forming gene networks are analysed in the transcription step, post-transcriptional events (e.g. translation, protein processing) play an important role in establishing protein expression patterns and levels. Since the post-transcriptional regulation of Drosophila depends on spatiotemporal interactions between mRNAs and gap proteins, proper physically-inspired stochastic models are required to study the link between both quantities. Previous research attempts have shown that using Gaussian processes (GPs) and differential equations lead to promising predictions when analysing regulatory networks. Here we aim at further investigating two types of physically-inspired GP models based on a reaction-diffusion equation where the main difference lies in where the prior is placed. While one of them has been studied previously using protein data only, the other is novel and yields a simple approach requiring only the differentiation of kernel functions. In contrast to other stochastic frameworks, discretising the spatial space is not required here. Both GP models are tested under different conditions depending on the availability of gap gene mRNA expression data. Finally, their performances are assessed on a high-resolution dataset describing the blastoderm stage of the early embryo of Drosophila melanogaster

14.7MLMay 19, 2018Code

Heterogeneous Multi-output Gaussian Process Prediction

Pablo Moreno-Muñoz, Antonio Artés-Rodríguez, Mauricio A. Álvarez

We present a novel extension of multi-output Gaussian processes for handling heterogeneous outputs. We assume that each output has its own likelihood function and use a vector-valued Gaussian process prior to jointly model the parameters in all likelihoods as latent functions. Our multi-output Gaussian process uses a covariance function with a linear model of coregionalisation form. Assuming conditional independence across the underlying latent functions together with an inducing variable framework, we are able to obtain tractable variational bounds amenable to stochastic variational inference. We illustrate the performance of the model on synthetic data and two real datasets: a human behavioral study and a demographic high-dimensional dataset.

6.1MLMay 18, 2018Code

Fast Kernel Approximations for Latent Force Models and Convolved Multiple-Output Gaussian processes

Cristian Guarnizo, Mauricio A. Álvarez

A latent force model is a Gaussian process with a covariance function inspired by a differential operator. Such covariance function is obtained by performing convolution integrals between Green's functions associated to the differential operators, and covariance functions associated to latent functions. In the classical formulation of latent force models, the covariance functions are obtained analytically by solving a double integral, leading to expressions that involve numerical solutions of different types of error functions. In consequence, the covariance matrix calculation is considerably expensive, because it requires the evaluation of one or more of these error functions. In this paper, we use random Fourier features to approximate the solution of these double integrals obtaining simpler analytical expressions for such covariance functions. We show experimental results using ordinary differential operators and provide an extension to build general kernel functions for convolved multiple output Gaussian processes.

11.7SYSep 15, 2017

Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems

Simo Särkkä, Mauricio A. Álvarez, Neil D. Lawrence

This article is concerned with learning and stochastic control in physical systems which contain unknown input signals. These unknown signals are modeled as Gaussian processes (GP) with certain parametrized covariance structures. The resulting latent force models (LFMs) can be seen as hybrid models that contain a first-principles physical model part and a non-parametric GP model part. We briefly review the statistical inference and learning methods for this kind of models, introduce stochastic control methodology for the models, and provide new theoretical observability and controllability results for them.

12.8MLMay 27, 2017

Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Zhenwen Dai, Mauricio A. Álvarez, Neil D. Lawrence

Often in machine learning, data are collected as a combination of multiple conditions, e.g., the voice recordings of multiple persons, each labeled with an ID. How could we build a model that captures the latent information related to these conditions and generalize to a new one with few data? We present a new model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and that allows to jointly model multiple conditions for regression and generalize to a new condition with a few data points at test time. LVMOGP infers the posteriors of Gaussian processes together with a latent space representing the information about different conditions. We derive an efficient variational inference method for LVMOGP, of which the computational complexity is as low as sparse Gaussian processes. We show that LVMOGP significantly outperforms related Gaussian process methods on various tasks with both synthetic and real data.

1.2NCAug 17, 2016

A Three Spatial Dimension Wave Latent Force Model for Describing Excitation Sources and Electric Potentials Produced by Deep Brain Stimulation

Pablo A. Alvarado, Mauricio A. Álvarez, Álvaro A. Orozco

Deep brain stimulation (DBS) is a surgical treatment for Parkinson's Disease. Static models based on quasi-static approximation are common approaches for DBS modeling. While this simplification has been validated for bioelectric sources, its application to rapid stimulation pulses, which contain more high-frequency power, may not be appropriate, as DBS therapeutic results depend on stimulus parameters such as frequency and pulse width, which are related to time variations of the electric field. We propose an alternative hybrid approach based on probabilistic models and differential equations, by using Gaussian processes and wave equation. Our model avoids quasi-static approximation, moreover, it is able to describe dynamic behavior of DBS. Therefore, the proposed model may be used to obtain a more realistic phenomenon description. The proposed model can also solve inverse problems, i.e. to recover the corresponding source of excitation, given electric potential distribution. The electric potential produced by a time-varying source was predicted using proposed model. For static sources, the electric potential produced by different electrode configurations were modeled. Four different sources of excitation were recovered by solving the inverse problem. We compare our outcomes with the electric potential obtained by solving Poisson's equation using the Finite Element Method (FEM). Our approach is able to take into account time variations of the source and the produced field. Also, inverse problem can be addressed using the proposed model. The electric potential calculated with the proposed model is close to the potential obtained by solving Poisson's equation using FEM.

1.1CVJun 25, 2016

A Tucker decomposition process for probabilistic modeling of diffusion magnetic resonance imaging

Hernan Dario Vargas Cardona, Mauricio A. Alvarez, Alvaro A. Orozco

Diffusion magnetic resonance imaging (dMRI) is an emerging medical technique used for describing water diffusion in an organic tissue. Typically, rank-2 tensors quantify this diffusion. From this quantification, it is possible to calculate relevant scalar measures (i.e. fractional anisotropy and mean diffusivity) employed in clinical diagnosis of neurological diseases. Nonetheless, 2nd-order tensors fail to represent complex tissue structures like crossing fibers. To overcome this limitation, several researchers proposed a diffusion representation with higher order tensors (HOT), specifically 4th and 6th orders. However, the current acquisition protocols of dMRI data allow images with a spatial resolution between 1 $mm^3$ and 2 $mm^3$. This voxel size is much smaller than tissue structures. Therefore, several clinical procedures derived from dMRI may be inaccurate. Interpolation has been used to enhance resolution of dMRI in a tensorial space. Most interpolation methods are valid only for rank-2 tensors and a generalization for HOT data is missing. In this work, we propose a novel stochastic process called Tucker decomposition process (TDP) for performing HOT data interpolation. Our model is based on the Tucker decomposition and Gaussian processes as parameters of the TDP. We test the TDP in 2nd, 4th and 6th rank HOT fields. For rank-2 tensors, we compare against direct interpolation, log-Euclidean approach and Generalized Wishart processes. For rank-4 and rank-6 tensors we compare against direct interpolation. Results obtained show that TDP interpolates accurately the HOT fields and generalizes to any rank.

1.3MLMar 16, 2016

Short-term time series prediction using Hilbert space embeddings of autoregressive processes

Edgar A. Valencia, Mauricio A. Álvarez

Linear autoregressive models serve as basic representations of discrete time stochastic processes. Different attempts have been made to provide non-linear versions of the basic autoregressive process, including different versions based on kernel methods. Motivated by the powerful framework of Hilbert space embeddings of distributions, in this paper we apply this methodology for the kernel embedding of an autoregressive process of order $p$. By doing so, we provide a non-linear version of an autoregressive process, that shows increased performance over the linear model in highly complex time series. We use the method proposed for one-step ahead forecasting of different time-series, and compare its performance against other non-linear methods.

1.2BIO-PHNov 23, 2015Code

Switched latent force models for reverse-engineering transcriptional regulation in gene expression data

Andrés F. López-Lopera, Mauricio A. Álvarez

To survive environmental conditions, cells transcribe their response activities into encoded mRNA sequences in order to produce certain amounts of protein concentrations. The external conditions are mapped into the cell through the activation of special proteins called transcription factors (TFs). Due to the difficult task to measure experimentally TF behaviours, and the challenges to capture their quick-time dynamics, different types of models based on differential equations have been proposed. However, those approaches usually incur in costly procedures, and they present problems to describe sudden changes in TF regulators. In this paper, we present a switched dynamical latent force model for reverse-engineering transcriptional regulation in gene expression data which allows the exact inference over latent TF activities driving some observed gene expressions through a linear differential equation. To deal with discontinuities in the dynamics, we introduce an approach that switches between different TF activities and different dynamical systems. This creates a versatile representation of transcription networks that can capture discrete changes and non-linearities We evaluate our model on both simulated data and real-data (e.g. microaerobic shift in E. coli, yeast respiration), concluding that our framework allows for the fitting of the expression data while being able to infer continuous-time TF profiles.

1.5MLMar 30, 2015

A Parzen-based distance between probability measures as an alternative of summary statistics in Approximate Bayesian Computation

Carlos D. Zuluaga, Edgar A. Valencia, Mauricio A. Álvarez

Approximate Bayesian Computation (ABC) are likelihood-free Monte Carlo methods. ABC methods use a comparison between simulated data, using different parameters drew from a prior distribution, and observed data. This comparison process is based on computing a distance between the summary statistics from the simulated data and the observed data. For complex models, it is usually difficult to define a methodology for choosing or constructing the summary statistics. Recently, a nonparametric ABC has been proposed, that uses a dissimilarity measure between discrete distributions based on empirical kernel embeddings as an alternative for summary statistics. The nonparametric ABC outperforms other methods including ABC, kernel ABC or synthetic likelihood ABC. However, it assumes that the probability distributions are discrete, and it is not robust when dealing with few observations. In this paper, we propose to apply kernel embeddings using an smoother density estimator or Parzen estimator for comparing the empirical data distributions, and computing the ABC posterior. Synthetic data and real data were used to test the Bayesian inference of our method. We compare our method with respect to state-of-the-art methods, and demonstrate that our method is a robust estimator of the posterior distribution in terms of the number of observations.

1.5MLMar 22, 2015

Indian Buffet process for model selection in convolved multiple-output Gaussian processes

Cristian Guarnizo, Mauricio A. Álvarez

Multi-output Gaussian processes have received increasing attention during the last few years as a natural mechanism to extend the powerful flexibility of Gaussian processes to the setup of multiple output variables. The key point here is the ability to design kernel functions that allow exploiting the correlations between the outputs while fulfilling the positive definiteness requisite for the covariance function. Alternatives to construct these covariance functions are the linear model of coregionalization and process convolutions. Each of these methods demand the specification of the number of latent Gaussian process used to build the covariance function for the outputs. We propose in this paper, the use of an Indian Buffet process as a way to perform model selection over the number of latent Gaussian processes. This type of model is particularly important in the context of latent force models, where the latent forces are associated to physical quantities like protein profiles or latent forces in mechanical systems. We use variational inference to estimate posterior distributions over the variables involved, and show examples of the model performance over artificial data, a motion capture dataset, and a gene expression dataset.

1.5MLFeb 7, 2015

Discriminative training for Convolved Multiple-Output Gaussian processes

Sebastián Gómez-González, Mauricio A. Álvarez, Hernán Felipe García

Multi-output Gaussian processes (MOGP) are probability distributions over vector-valued functions, and have been previously used for multi-output regression and for multi-class classification. A less explored facet of the multi-output Gaussian process is that it can be used as a generative model for vector-valued random fields in the context of pattern recognition. As a generative model, the multi-output GP is able to handle vector-valued functions with continuous inputs, as opposed, for example, to hidden Markov models. It also offers the ability to model multivariate random functions with high dimensional inputs. In this report, we use a discriminative training criteria known as Minimum Classification Error to fit the parameters of a multi-output Gaussian process. We compare the performance of generative training and discriminative training of MOGP in emotion recognition, activity recognition, and face recognition. We also compare the proposed methodology against hidden Markov models trained in a generative and in a discriminative way.