MLAug 26, 2023
Learning variational autoencoders via MCMC speed measuresMarcel Hirt, Vasileios Kreouzis, Petros Dellaportas
Variational autoencoders (VAEs) are popular likelihood-based generative models which can be efficiently trained by maximizing an Evidence Lower Bound (ELBO). There has been much progress in improving the expressiveness of the variational distribution to obtain tighter variational bounds and increased generative performance. Whilst previous work has leveraged Markov chain Monte Carlo (MCMC) methods for the construction of variational densities, gradient-based methods for adapting the proposal distributions for deep latent variable models have received less attention. This work suggests an entropy-based adaptation for a short-run Metropolis-adjusted Langevin (MALA) or Hamiltonian Monte Carlo (HMC) chain while optimising a tighter variational bound to the log-evidence. Experiments show that this approach yields higher held-out log-likelihoods as well as improved generative metrics. Our implicit variational density can adapt to complicated posterior geometries of latent hierarchical representations arising in hierarchical VAEs.
LGApr 25, 2024
Probabilistic Multi-Layer Perceptrons for Wind Farm Condition MonitoringFilippo Fiocchi, Domna Ladopoulou, Petros Dellaportas
We provide a condition monitoring system for wind farms, based on normal behaviour modelling using a probabilistic multi-layer perceptron with transfer learning via fine-tuning. The model predicts the output power of the wind turbine under normal behaviour based on features retrieved from supervisory control and data acquisition (SCADA) systems. Its advantages are that (i) it can be trained with SCADA data of at least a few years, (ii) it can incorporate all SCADA data of all wind turbines in a wind farm as features, (iii) it assumes that the output power follows a normal density with heteroscedastic variance and (iv) it can predict the output of one wind turbine by borrowing strength from the data of all other wind turbines in a farm. Probabilistic guidelines for condition monitoring are given via a cumulative sum (CUSUM) control chart, which is specifically designed based on a real-data classification exercise and, hence, is adapted to the needs of a wind farm. We illustrate the performance of our model in a real SCADA data example which provides evidence that it outperforms other probabilistic prediction models.
APMay 13, 2025
Probabilistic Wind Power Modelling via Heteroscedastic Non-Stationary Gaussian ProcessesDomniki Ladopoulou, Dat Minh Hong, Petros Dellaportas
Accurate probabilistic prediction of wind power is crucial for maintaining grid stability and facilitating the efficient integration of renewable energy sources. Gaussian process (GP) models offer a principled framework for quantifying uncertainty; however, conventional approaches typically rely on stationary kernels and homoscedastic noise assumptions, which are inadequate for modelling the inherently non-stationary and heteroscedastic nature of wind speed and power output. We propose a heteroscedastic non-stationary GP framework based on the generalised spectral mixture kernel, enabling the model to capture input-dependent correlations as well as input-dependent variability in wind speed-power data. We evaluate the proposed model on 10-minute supervisory control and data acquisition (SCADA) measurements and compare it against GP variants with stationary and non-stationary kernels, as well as commonly used non-GP probabilistic baselines. The results highlight the necessity of modelling both non-stationarity and heteroscedasticity in wind power prediction and demonstrate the practical value of flexible non-stationary GP models in operational SCADA settings.
LGOct 3, 2025
Multi-task neural diffusion processes for uncertainty-quantified wind power predictionJoseph Rawson, Domniki Ladopoulou, Petros Dellaportas
Uncertainty-aware wind power prediction is essential for grid integration and reliable wind farm operation. We apply neural diffusion processes (NDPs)-a recent class of models that learn distributions over functions-and extend them to a multi-task NDP (MT-NDP) framework for wind power prediction. We provide the first empirical evaluation of NDPs in real supervisory control and data acquisition (SCADA) data. We introduce a task encoder within MT-NDPs to capture cross-turbine correlations and enable few-shot adaptation to unseen turbines. The proposed MT-NDP framework outperforms single-task NDPs and GPs in terms of point accuracy and calibration, particularly for wind turbines whose behaviour deviates from the fleet average. In general, NDP-based models deliver calibrated and scalable predictions suitable for operational deployment, offering sharper, yet trustworthy, predictive intervals that can support dispatch and maintenance decisions in modern wind farms.
MLJun 26, 2025
Gaussian Invariant Markov Chain Monte CarloMichalis K. Titsias, Angelos Alexopoulos, Siran Liu et al.
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
STJun 25, 2024
Variance Reduction for the Independent Metropolis SamplerSiran Liu, Petros Dellaportas, Michalis K. Titsias
Assume that we would like to estimate the expected value of a function $F$ with respect to an intractable density $π$, which is specified up to some unknown normalising constant. We prove that if $π$ is close enough under KL divergence to another density $q$, an independent Metropolis sampler estimator that obtains samples from $π$ with proposal density $q$, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic variance than i.i.d.\ sampling from $π$. The control variates construction requires no extra computational effort but assumes that the expected value of $F$ under $q$ is analytically available. We illustrate this result by calculating the marginal likelihood in a linear regression model with prior-likelihood conflict and a non-conjugate prior. Furthermore, we propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced. We demonstrate its applicability in a Bayesian logistic and Gaussian process regression problems and we rigorously justify our asymptotic arguments under easily verifiable and essentially minimal conditions.
MLDec 13, 2021
How Good are Low-Rank Approximations in Gaussian Process Regression?Constantinos Daskalakis, Petros Dellaportas, Aristeidis Panos
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
COOct 27, 2021
Entropy-based adaptive Hamiltonian Monte CarloMarcel Hirt, Michalis K. Titsias, Petros Dellaportas
Hamiltonian Monte Carlo (HMC) is a popular Markov Chain Monte Carlo (MCMC) algorithm to sample from an unnormalized probability distribution. A leapfrog integrator is commonly used to implement HMC in practice, but its performance can be sensitive to the choice of mass matrix used therein. We develop a gradient-based algorithm that allows for the adaptation of the mass matrix by encouraging the leapfrog integrator to have high acceptance rates while also exploring all dimensions jointly. In contrast to previous work that adapt the hyperparameters of HMC using some form of expected squared jumping distance, the adaptation strategy suggested here aims to increase sampling efficiency by maximizing an approximation of the proposal entropy. We illustrate that using multiple gradients in the HMC proposal can be beneficial compared to a single gradient-step in Metropolis-adjusted Langevin proposals. Empirical evidence suggests that the adaptation method can outperform different versions of HMC schemes by adjusting the mass matrix to the geometry of the target distribution and by providing some control on the integration time.
MLMay 30, 2021
Scalable Marked Point Processes for Exchangeable and Non-Exchangeable Event SequencesAristeidis Panos, Ioannis Kosmidis, Petros Dellaportas
We adopt the interpretability offered by a parametric, Hawkes-process-inspired conditional probability mass function for the marks and apply variational inference techniques to derive a general and scalable inferential framework for marked point processes. The framework can handle both exchangeable and non-exchangeable event sequences with minimal tuning and without any pre-training. This contrasts with many parametric and non-parametric state-of-the-art methods that typically require pre-training and/or careful tuning, and can only handle exchangeable event sequences. The framework's competitive computational and predictive performance against other state-of-the-art methods are illustrated through real data experiments. Its attractiveness for large-scale applications is demonstrated through a case study involving all events occurring in an English Premier League season.
MLApr 3, 2020
How Good are Low-Rank Approximations in Gaussian Process Regression?Constantinos Daskalakis, Petros Dellaportas, Aristeidis Panos
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
MLNov 4, 2019
Gradient-based Adaptive Markov Chain Monte CarloMichalis K. Titsias, Petros Dellaportas
We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our method compared to traditional adaptive MCMC methods is that the adaptation occurs even when candidate state values are rejected. This is a highly desirable property of any adaptation strategy because the adaptation starts in early iterations even if the initial proposal distribution is far from optimum. We apply the framework for learning multivariate random walk Metropolis and Metropolis-adjusted Langevin proposals with full covariance matrices, and provide empirical evidence that our method can outperform other MCMC algorithms, including Hamiltonian Monte Carlo schemes.
MLApr 15, 2019
Copula-like Variational InferenceMarcel Hirt, Petros Dellaportas, Alain Durmus
This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity $\mathcal{O}(d \log d)$. We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.
MLJul 6, 2018
Fully Scalable Gaussian Processes using Subspace Inducing InputsAristeidis Panos, Petros Dellaportas, Michalis K. Titsias
We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data. Our key idea is a representation trick over the inducing variables called subspace inducing inputs. This is combined with certain matrix-preconditioning based parametrizations of the variational distributions that lead to simplified and numerically stable variational lower bounds. Our illustrative applications are based on challenging extreme multi-label classification problems with the extra burden of the very large number of class labels. We demonstrate the usefulness of our approach by presenting predictive performances together with low computational times in datasets with extremely large number of instances and input dimensions.
MLMay 23, 2018
Scalable Bayesian Learning for State Space Models using Variational Inference with SMC SamplersMarcel Hirt, Petros Dellaportas
We present a scalable approach to performing approximate fully Bayesian inference in generic state space models. The proposed method is an alternative to particle MCMC that provides fully Bayesian inference of both the dynamic latent states and the static parameters of the model. We build up on recent advances in computational statistics that combine variational methods with sequential Monte Carlo sampling and we demonstrate the advantages of performing full Bayesian inference over the static parameters rather than just performing variational EM approximations. We illustrate how our approach enables scalable inference in multivariate stochastic volatility models and self-exciting point process models that allow for flexible dynamics in the latent intensity function.