Ilja Klebanov

h-index6

5papers

94citations

Novelty53%

AI Score42

Ranked #61,564 of 194,257 authors (top 32%)#38 in ST (top 19%)

5 Papers

5.0STMay 19

Error Bounds for Importance Sampling with Estimated Proposal Distributions

Cathrine Aeckerle-Willems, Ilja Klebanov, Simon Weissmann

Importance sampling with data-driven proposal distributions is widely used in practice. A common workflow first generates an auxiliary sample of size $N$ from an approximation of the target distribution, constructs a density estimate $\hat q$ such as a kernel density estimator (KDE), and then draws $n$ importance samples from this learned proposal. Despite its practical relevance, the theoretical properties of this hierarchical procedure remain poorly understood, since classical importance sampling theory assumes a fixed proposal. We address this gap by deriving non-asymptotic error bounds for standard, defensive, and self-normalized importance sampling estimators with random proposals. Our results separate the Monte Carlo error, scaling as $n^{-1/2}$, from the proposal approximation error measured through the mean integrated absolute and squared errors (MIAE and MISE) of $\hat q$. To obtain explicit convergence rates in $(N,n)$, we establish MIAE and MISE bounds for KDEs constructed from geometrically ergodic Markov chains in stationary and non-stationary regimes. Combining these results yields quantitative guarantees for importance sampling with KDE-based proposals. Our theory provides practical guidance for selecting defensive mixture weights in a nonparametric importance sampling framework.

3.1MLOct 11, 2024

Deterministic Fokker-Planck Transport -- With Applications to Sampling, Variational Inference, Kernel Mean Embeddings & Sequential Monte Carlo

Ilja Klebanov

The Fokker-Planck equation can be reformulated as a continuity equation, which naturally suggests using the associated velocity field in particle flow methods. While the resulting probability flow ODE offers appealing properties - such as defining a gradient flow of the Kullback-Leibler divergence between the current and target densities with respect to the 2-Wasserstein distance - it relies on evaluating the current probability density, which is intractable in most practical applications. By closely examining the drawbacks of approximating this density via kernel density estimation, we uncover opportunities to turn these limitations into advantages in contexts such as variational inference, kernel mean embeddings, and sequential Monte Carlo.

6.6STAug 27, 2020

The linear conditional expectation in Hilbert space

Ilja Klebanov, Björn Sprungk, T. J. Sullivan

The linear conditional expectation (LCE) provides a best linear (or rather, affine) estimate of the conditional expectation and hence plays an important rôle in approximate Bayesian inference, especially the Bayes linear approach. This article establishes the analytical properties of the LCE in an infinite-dimensional Hilbert space context. In addition, working in the space of affine Hilbert--Schmidt operators, we establish a regularisation procedure for this LCE. As an important application, we obtain a simple alternative derivation and intuitive justification of the conditional mean embedding formula, a concept widely used in machine learning to perform the conditioning of random variables by embedding them into reproducing kernel Hilbert spaces.

17.6STDec 2, 2019

A Rigorous Theory of Conditional Mean Embeddings

Ilja Klebanov, Ingmar Schuster, T. J. Sullivan

Conditional mean embeddings (CMEs) have proven themselves to be a powerful tool in many machine learning applications. They allow the efficient conditioning of probability distributions within the corresponding reproducing kernel Hilbert spaces (RKHSs) by providing a linear-algebraic relation for the kernel mean embeddings of the respective joint and conditional probability distributions. Both centred and uncentred covariance operators have been used to define CMEs in the existing literature. In this paper, we develop a mathematically rigorous theory for both variants, discuss the merits and problems of each, and significantly weaken the conditions for applicability of CMEs. In the course of this, we demonstrate a beautiful connection to Gaussian conditioning in Hilbert spaces.

7.8MLMay 18, 2018

Markov Chain Importance Sampling -- a highly efficient estimator for MCMC

Ingmar Schuster, Ilja Klebanov

Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm, it provides a novel way of correcting the discretization error. Our estimator satisfies a central limit theorem and improves on error per CPU cycle, often to a large extent. As a by-product it enables estimating the normalizing constant, an important quantity in Bayesian machine learning and statistics.