Jonathan Weare

h-index21

11papers

327citations

Novelty51%

AI Score42

Ranked #58,355 of 194,257 authors (top 30%)#13,255 in LG (top 33%)

11 Papers

5.1PRApr 9, 2014

Improved diffusion Monte Carlo

Martin Hairer, Jonathan Weare

We propose a modification, based on the RESTART (repetitive simulation trials after reaching thresholds) and DPR (dynamics probability redistribution) rare event simulation algorithms, of the standard diffusion Monte Carlo (DMC) algorithm. The new algorithm has a lower variance per workload, regardless of the regime considered. In particular, it makes it feasible to use DMC in situations where the "naïve" generalisation of the standard algorithm would be impractical, due to an exponential explosion of its variance. We numerically demonstrate the effectiveness of the new algorithm on a standard rare event simulation problem (probability of an unlikely transition in a Lennard-Jones cluster), as well as a high-frequency data assimilation problem.

1.2NAJun 4, 2008

Variance reduction for particle filters of systems with time-scale separation

Dror Givon, Panagiotis Stinis, Jonathan Weare

We present a particle filter construction for a system that exhibits time-scale separation. The separation of time-scales allows two simplifications that we exploit: i) The use of the averaging principle for the dimensional reduction of the system needed to solve for each particle and ii) the factorization of the transition probability which allows the Rao-Blackwellization of the filtering step. Both simplifications can be implemented using the coarse projective integration framework. The resulting particle filter is faster and has smaller variance than the particle filter based on the original system. The method is tested on a multiscale stochastic differential equation and on a multiscale pure jump diffusion motivated by chemical reactions.

1.2CHEM-PHNov 11, 2022

Understanding and eliminating spurious modes in variational Monte Carlo using collective variables

Huan Zhang, Robert J. Webber, Michael Lindsey et al.

The use of neural network parametrizations to represent the ground state in variational Monte Carlo (VMC) calculations has generated intense interest in recent years. However, as we demonstrate in the context of the periodic Heisenberg spin chain, this approach can produce unreliable wave function approximations. One of the most obvious signs of failure is the occurrence of random, persistent spikes in the energy estimate during training. These energy spikes are caused by regions of configuration space that are over-represented by the wave function density, which are called ``spurious modes'' in the machine learning literature. After exploring these spurious modes in detail, we demonstrate that a collective-variable-based penalization yields a substantially more robust training procedure, preventing the formation of spurious modes and improving the accuracy of energy estimates. Because the penalization scheme is cheap to implement and is not specific to the particular model studied here, it can be extended to other applications of VMC where a reasonable choice of collective variable is available.

7.9LGSep 30, 2024Code

Using pretrained graph neural networks with token mixers as geometric featurizers for conformational dynamics

Zihan Pengmei, Chatipat Lorpaiboon, Spencer C. Guo et al.

Identifying informative low-dimensional features that characterize dynamics in molecular simulations remains a challenge, often requiring extensive manual tuning and system-specific knowledge. Here, we introduce geom2vec, in which pretrained graph neural networks (GNNs) are used as universal geometric featurizers. By pretraining equivariant GNNs on a large dataset of molecular conformations with a self-supervised denoising objective, we obtain transferable structural representations that are useful for learning conformational dynamics without further fine-tuning. We show how the learned GNN representations can capture interpretable relationships between structural units (tokens) by combining them with expressive token mixers. Importantly, decoupling training the GNNs from training for downstream tasks enables analysis of larger molecular graphs (such as small proteins at all-atom resolution) with limited computational resources. In these ways, geom2vec eliminates the need for manual feature selection and increases the robustness of simulation analyses.

1.2CHEM-PHJan 28

Quantum statistics from classical simulations via generative Gibbs sampling

Weizhou Wang, Xuanxi Zhang, Jonathan Weare et al.

Accurate simulation of nuclear quantum effects is essential for molecular modeling but expensive using path integral molecular dynamics (PIMD). We present GG-PI, a ring-polymer-based framework that combines generative modeling of the single-bead conditional density with Gibbs sampling to recover quantum statistics from classical simulation data. GG-PI uses inexpensive standard classical simulations or existing data for training and allows transfer across temperatures without retraining. On standard test systems, GG-PI significantly reduces wall clock time compared to PIMD. Our approach extends easily to a wide range of problems with similar Markov structure.

9.8LGOct 8, 2023

Improved Active Learning via Dependent Leverage Score Sampling

Atsushi Shimizu, Xiaoou Cheng, Christopher Musco et al.

We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non-independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the \emph{pivotal sampling algorithm}, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to $50\%$. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak \emph{one-sided $\ell_{\infty}$ independence condition} (which includes pivotal sampling) can actively learn $d$ dimensional linear functions with $O(d\log d)$ samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under $\ell_{\infty}$ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound on $O(d)$ samples.

9.2MLAug 20, 2024

Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias

Yifan Chen, Xiaoou Cheng, Jonathan Niles-Weed et al.

The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or $\sqrt{d}$. In this paper, we argue that, despite this poor scaling of the $W_2$ error for the full set of variables, the behavior for a small number of variables can be significantly better: a number of iterations proportional to $K$, up to logarithmic terms in $d$, often suffices for the algorithm to converge to within a desired $W_2$ error for all $K$-marginals. We refer to this effect as delocalization of bias. We show that the delocalization effect does not hold universally and prove its validity for Gaussian distributions and strongly log-concave distributions with certain sparse interactions. Our analysis relies on a novel $W_{2,\ell^\infty}$ metric to measure convergence. A key technical challenge we address is the lack of a one-step contraction property in this metric. Finally, we use asymptotic arguments to explore potential generalizations of the delocalization effect beyond the Gaussian and sparse interactions setting.

13.4AO-PHOct 19, 2024Code

Can AI weather models predict out-of-distribution gray swan tropical cyclones?

Y. Qiang Sun, Pedram Hassanzadeh, Mohsen Zand et al.

Predicting gray swan weather extremes, which are possible but so rare that they are absent from the training dataset, is a major concern for AI weather models and long-term climate emulators. An important open question is whether AI models can extrapolate from weaker weather events present in the training set to stronger, unseen weather extremes. To test this, we train independent versions of the AI model FourCastNet on the 1979-2015 ERA5 dataset with all data, or with Category 3-5 tropical cyclones (TCs) removed, either globally or only over the North Atlantic or Western Pacific basin. We then test these versions of FourCastNet on 2018-2023 Category 5 TCs (gray swans). All versions yield similar accuracy for global weather, but the one trained without Category 3-5 TCs cannot accurately forecast Category 5 TCs, indicating that these models cannot extrapolate from weaker storms. The versions trained without Category 3-5 TCs in one basin show some skill forecasting Category 5 TCs in that basin, suggesting that FourCastNet can generalize across tropical basins. This is encouraging and surprising because regional information is implicitly encoded in inputs. Given that current state-of-the-art AI weather and climate models have similar learning strategies, we expect our findings to apply to other models. Other types of weather extremes need to be similarly investigated. Our work demonstrates that novel learning strategies are needed for AI models to reliably provide early warning or estimated statistics for the rarest, most impactful TCs, and, possibly, other weather extremes.

4.3SIJun 25, 2020Code

A metric on directed graphs and Markov chains based on hitting probabilities

Zachary M. Boyd, Nicolas Fraiman, Jeremy L. Marzuola et al.

The shortest-path, commute time, and diffusion distances on undirected graphs have been widely employed in applications such as dimensionality reduction, link prediction, and trip planning. Increasingly, there is interest in using asymmetric structure of data derived from Markov chains and directed graphs, but few metrics are specifically adapted to this task. We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain and, in particular, on any Markov chain derived from a directed graph. Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity. Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics. We use possible degeneracies in the metric to develop an interesting structural theory of directed graphs and explore a related quotienting procedure. Our metric can be computed in $O(n^3)$ time, where $n$ is the number of states, and in examples we scale up to $n=10,000$ nodes and $\approx 38M$ edges on a desktop computer. In several examples, we explore the nature of the metric, compare it to alternative methods, and demonstrate its utility for weak recovery of community structure in dense graphs, visualization, structure recovering, dynamics exploration, and multiscale cluster detection.

11.3MEJul 13, 2016

Ensemble preconditioning for Markov chain Monte Carlo simulation

Charles Matthews, Jonathan Weare, Benedict Leimkuhler

We describe parallel Markov chain Monte Carlo methods that propagate a collective ensemble of paths, with local covariance information calculated from neighboring replicas. The use of collective dynamics eliminates multiplicative noise and stabilizes the dynamics thus providing a practical approach to difficult anisotropic sampling problems in high dimensions. Numerical experiments with model problems demonstrate that dramatic potential speedups, compared to various alternative schemes, are attainable.

1.2NAOct 9, 2015

Sharp entrywise perturbation bounds for Markov chains

Erik Thiede, Brian Van Koten, Jonathan Weare

For many Markov chains of practical interest, the invariant distribution is extremely sensitive to perturbations of some entries of the transition matrix, but insensitive to others; we give an example of such a chain, motivated by a problem in computational statistical physics. We have derived perturbation bounds on the relative error of the invariant distribution that reveal these variations in sensitivity. Our bounds are sharp, we do not impose any structural assumptions on the transition matrix or on the perturbation, and computing the bounds has the same complexity as computing the invariant distribution or computing other bounds in the literature. Moreover, our bounds have a simple interpretation in terms of hitting times, which can be used to draw intuitive but rigorous conclusions about the sensitivity of a chain to various types of perturbations.