NAAug 1, 2018
An EM based Iterative Method for Solving Large Sparse Linear SystemsMinwoo Chae, Stephen G. Walker
We propose a novel iterative algorithm for solving a large sparse linear system. The method is based on the EM algorithm. If the system has a unique solution, the algorithm guarantees convergence with a geometric rate. Otherwise, convergence to a minimal Kullback--Leibler divergence point is guaranteed. The algorithm is easy to code and competitive with other iterative algorithms.
MLMar 3
Scalable Uncertainty Quantification for Black-Box Density-Based ClusteringNicola Bariletto, Stephen G. Walker
We introduce a novel framework for uncertainty quantification in clustering. By combining the martingale posterior paradigm with density-based clustering, uncertainty in the estimated density is naturally propagated to the clustering structure. The approach scales effectively to high-dimensional and irregularly shaped data by leveraging modern neural density estimators and GPU-friendly parallel computation. We establish frequentist consistency guarantees and validate the methodology on synthetic and real data.
MLAug 28, 2025
Weighted Support Points from Random Measures: An Interpretable Alternative for Generative ModelingPeiqi Zhao, Carlos E. Rodríguez, Ramsés H. Mena et al.
Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer a compact yet informative representation of the original data. We build on this idea to introduce a generative modeling framework based on random weighted support points, where the randomness arises from a weighting scheme inspired by the Dirichlet process and the Bayesian bootstrap. The proposed method generates diverse and interpretable sample sets from a fixed dataset, without relying on probabilistic modeling assumptions or neural network architectures. We present the theoretical formulation of the method and develop an efficient optimization algorithm based on the Convex--Concave Procedure (CCP). Empirical results on the MNIST and CelebA-HQ datasets show that our approach produces high-quality and diverse outputs at a fraction of the computational cost of black-box alternatives such as Generative Adversarial Networks (GANs) or Denoising Diffusion Probabilistic Models (DDPMs). These results suggest that random weighted support points offer a principled, scalable, and interpretable alternative for generative modeling. A key feature is their ability to produce genuinely interpolative samples that preserve underlying data structure.
STJan 10, 2022
Bayesian Consistency with the Supremum MetricNhat Ho, Stephen G. Walker
We present simple conditions for Bayesian consistency in the supremum metric. The key to the technique is a triangle inequality which allows us to explicitly use weak convergence, a consequence of the standard Kullback--Leibler support condition for the prior. A further condition is to ensure that smoothed versions of densities are not too far from the original density, thus dealing with densities which could track the data too closely. A key result of the paper is that we demonstrate supremum consistency using weaker conditions compared to those currently used to secure $\mathbb{L}_1$ consistency.
COJul 22, 2021
On Integral Theorems and their Statistical PropertiesNhat Ho, Stephen G. Walker
We introduce a class of integral theorems based on cyclic functions and Riemann sums approximating integrals. The Fourier integral theorem, derived as a combination of a transform and inverse transform, arises as a special case. The integral theorems provide natural estimators of density functions via Monte Carlo methods. Assessments of the quality of the density estimators can be used to obtain optimal cyclic functions, alternatives to the sin function, which minimize square integrals. Our proof techniques rely on a variational approach in ordinary differential equations and the Cauchy residue theorem in complex analysis.
MEJun 11, 2021
Statistical Analysis from the Fourier Integral TheoremNhat Ho, Stephen G. Walker
Taking the Fourier integral theorem as our starting point, in this paper we focus on natural Monte Carlo and fully nonparametric estimators of multivariate distributions and conditional distribution functions. We do this without the need for any estimated covariance matrix or dependence structure between variables. These aspects arise immediately from the integral theorem. Being able to model multivariate data sets using conditional distribution functions we can study a number of problems, such as prediction for Markov processes, estimation of mixing distribution functions which depend on covariates, and general multivariate data. Estimators are explicit Monte Carlo based and require no recursive or iterative algorithms.
LGMar 11, 2021
A Reinforcement Learning Based Approach to Play Calling in FootballPreston Biro, Stephen G. Walker
With the vast amount of data collected on football and the growth of computing abilities, many games involving decision choices can be optimized. The underlying rule is the maximization of an expected utility of outcomes and the law of large numbers. The data available allows us to compute with high accuracy the probabilities of outcomes of decisions and the well defined points system in the game allows us to have the necessary terminal utilities. With some well established theory we can then optimize choices at a single play level.
STDec 28, 2020
Multivariate Smoothing via the Fourier Integral Theorem and Fourier KernelNhat Ho, Stephen G. Walker
Starting with the Fourier integral theorem, we present natural Monte Carlo estimators of multivariate functions including densities, mixing densities, transition densities, regression functions, and the search for modes of multivariate density functions (modal regression). Rates of convergence are established and, in many cases, provide superior rates to current standard estimators such as those based on kernels, including kernel density estimators and kernel regression functions. Numerical illustrations are presented.
MEFeb 4, 2018
Testing to distinguish measures on metric spacesAndrew J. Blumberg, Prithwish Bhaumik, Stephen G. Walker
We study the problem of distinguishing between two distributions on a metric space; i.e., given metric measure spaces $({\mathbb X}, d, μ_1)$ and $({\mathbb X}, d, μ_2)$, we are interested in the problem of determining from finite data whether or not $μ_1$ is $μ_2$. The key is to use pairwise distances between observations and, employing a reconstruction theorem of Gromov, we can perform such a test using a two sample Kolmogorov--Smirnov test. A real analysis using phylogenetic trees and flu data is presented.