David H. Wolpert

AI
h-index43
18papers
409citations
Novelty43%
AI Score45

18 Papers

1.7MLAug 24, 2011
Using Supervised Learning to Improve Monte Carlo Integral Estimation

Brendan Tracey, David Wolpert, Juan J. Alonso

Monte Carlo (MC) techniques are often used to estimate integrals of a multivariate function using randomly generated samples of the function. In light of the increasing interest in uncertainty quantification and robust design applications in aerospace engineering, the calculation of expected values of such functions (e.g. performance measures) becomes important. However, MC techniques often suffer from high variance and slow convergence as the number of samples increases. In this paper we present Stacked Monte Carlo (StackMC), a new method for post-processing an existing set of MC samples to improve the associated integral estimate. StackMC is based on the supervised learning techniques of fitting functions and cross validation. It should reduce the variance of any type of Monte Carlo integral estimate (simple sampling, importance sampling, quasi-Monte Carlo, MCMC, etc.) without adding bias. We report on an extensive set of experiments confirming that the StackMC estimate of an integral is more accurate than both the associated unprocessed Monte Carlo estimate and an estimate based on a functional fit to the MC samples. These experiments run over a wide variety of integration spaces, numbers of sample points, dimensions, and fitting functions. In particular, we apply StackMC in estimating the expected value of the fuel burn metric of future commercial aircraft and in estimating sonic boom loudness measures. We compare the efficiency of StackMC with that of more standard methods and show that for negligible additional computational cost significant increases in accuracy are gained.

6.9SIMay 3
Computational foundations of the human world

Marcus J. Hamilton, Abhishek Yadav, Harrison Hartle et al.

Human societies continuously transform scattered information into collective judgments and coordinated action, whether through markets discovering prices, governments allocating resources, communities enforcing norms, or science converging on reliable claims. Importantly, the computational difficulty of collective decision-making, particularly the time and communication required to reach solutions, imposes fundamental constraints on social organization. While theoretical computer science offers formal tools for analyzing such problems, for instance, by analyzing resource requirements, including time and memory, surprisingly, there is no domain of social science that focuses on the nature of computation in the human world. This perspective argues that we now have the opportunity to deploy these computational frameworks to study human social organization, opening research directions at the intersection of computer science and social science. We highlight core social phenomena that can be framed as computational, including (i) distributed consensus and coordinated action, (ii) societal restructuring with scale, (iii) hierarchical and modular structure, and (iv) externalized memory systems. We identify several concepts from theoretical computer science that may provide insight into these phenomena, especially emphasizing more recently developed approaches beyond the paradigm of Turing~Machines and worst-case computational complexity.

1.2NAOct 6, 2008
Bias-Variance Techniques for Monte Carlo Optimization: Cross-validation for the CE Method

Dev Rajnarayan, David Wolpert

In this paper, we examine the CE method in the broad context of Monte Carlo Optimization (MCO) and Parametric Learning (PL), a type of machine learning. A well-known overarching principle used to improve the performance of many PL algorithms is the bias-variance tradeoff. This tradeoff has been used to improve PL algorithms ranging from Monte Carlo estimation of integrals, to linear estimation, to general statistical estimation. Moreover, as described by, MCO is very closely related to PL. Owing to this similarity, the bias-variance tradeoff affects MCO performance, just as it does PL performance. In this article, we exploit the bias-variance tradeoff to enhance the performance of MCO algorithms. We use the technique of cross-validation, a technique based on the bias-variance tradeoff, to significantly improve the performance of the Cross Entropy (CE) method, which is an MCO algorithm. In previous work we have confirmed that other PL techniques improve the perfomance of other MCO algorithms. We conclude that the many techniques pioneered in PL could be investigated as ways to improve MCO algorithms in general, and the CE method in particular.

4.3HIST-PHAug 8, 2022
What can we know about that which we cannot even imagine?

David H. Wolpert

In this essay I will consider a sequence of questions. The first questions concern the biological function of intelligence in general, and cognitive prostheses of human intelligence in particular. These will lead into questions concerning human language, perhaps the most important cognitive prosthesis humanity has ever developed. While it is traditional to rhapsodize about the cognitive power encapsulated in human language, I will emphasize how horribly limited human language is - and therefore how limited our cognitive abilities are, despite their being augmented with language. This will lead to questions of whether human mathematics, being ultimately formulated in terms of human language, is also deeply limited. I will then combine these questions to pose a partial, sort-of, sideways answer to the guiding concern of this essay: what we can ever discern about that we cannot even conceive?

8.3LOMar 19
Implications of computer science theory for the simulation hypothesis

David H. Wolpert

The simulation hypothesis has recently excited renewed interest in the physics and philosophy communities. However, the hypothesis specifically concerns {\textit{computers}} that simulate physical universes. So to formally investigate the hypothesis, we need to understand it in terms of computer science (CS) theory. In addition we need a formal way to couple CS theory with physics. Here I couple those fields by using the physical Church-Turing thesis. This allow me to exploit Kleene's second recursion, to prove that not only is it possible for {us} to be a simulation being run on a computer, but that we might be in a simulation being run a computer \emph{by us}. In such a ``self-simulation'', there would be two identical instances of us, both equally ``real''. I then use Rice's theorem to derive impossibility results concerning simulation and self-simulation; derive implications for (self-)simulation if we are being simulated in a program using fully homomorphic encryption; and briefly investigate the graphical structure of universes simulating other universes which contain computers running their own simulations. I end by describing some of the possible avenues for future research. While motivated in terms of the simulation hypothesis, the results in this paper are direct consequences of the Church-Turing thesis. So they apply far more broadly than the simulation hypothesis.

1.2APDec 11, 2021
The Past as a Stochastic Process

David H. Wolpert, Michael H. Price, Stefani A. Crabtree et al.

Historical processes manifest remarkable diversity. Nevertheless, scholars have long attempted to identify patterns and categorize historical actors and influences with some success. A stochastic process framework provides a structured approach for the analysis of large historical datasets that allows for detection of sometimes surprising patterns, identification of relevant causal actors both endogenous and exogenous to the process, and comparison between different historical cases. The combination of data, analytical tools and the organizing theoretical framework of stochastic processes complements traditional narrative approaches in history and archaeology.

4.4LGMar 22, 2021
The Implications of the No-Free-Lunch Theorems for Meta-induction

David H. Wolpert

The important recent book by G. Schurz appreciates that the no-free-lunch theorems (NFL) have major implications for the problem of (meta) induction. Here I review the NFL theorems, emphasizing that they do not only concern the case where there is a uniform prior -- they prove that there are "as many priors" (loosely speaking) for which any induction algorithm $A$ out-generalizes some induction algorithm $B$ as vice-versa. Importantly though, in addition to the NFL theorems, there are many {free lunch} theorems. In particular, the NFL theorems can only be used to compare the {marginal} expected performance of an induction algorithm $A$ with the marginal expected performance of an induction algorithm $B$. There is a rich set of free lunches which instead concern the statistical correlations among the generalization errors of induction algorithms. As I describe, the meta-induction algorithms that Schurz advocate as a "solution to Hume's problem" are just an example of such a free lunch based on correlations among the generalization errors of induction algorithms. I end by pointing out that the prior that Schurz advocates, which is uniform over bit frequencies rather than bit patterns, is contradicted by thousands of experiments in statistical physics and by the great success of the maximum entropy procedure in inductive inference.

2.3LOOct 28, 2020
Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes

David H. Wolpert, David Kinney

We present a computational model of mathematical reasoning according to which mathematics is a fundamentally stochastic process. That is, on our model, whether or not a given formula is deemed a theorem in some axiomatic system is not a matter of certainty, but is instead governed by a probability distribution. We then show that this framework gives a compelling account of several aspects of mathematical practice. These include: 1) the way in which mathematicians generate research programs, 2) the applicability of Bayesian models of mathematical heuristics, 3) the role of abductive reasoning in mathematics, 4) the way in which multiple proofs of a proposition can strengthen our degree of belief in that proposition, and 5) the nature of the hypothesis that there are multiple formal systems that are isomorphic to physically possible universes. Thus, by embracing a model of mathematics as not perfectly predictable, we generate a new and fruitful perspective on the epistemology and practice of mathematics.

10.1LGJul 21, 2020
What is important about the No Free Lunch theorems?

David H. Wolpert

The No Free Lunch theorems prove that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally. As I discuss in this chapter, the importance of the theorems arises by using them to analyze scenarios involving {non-uniform} distributions, and to compare different algorithms, without any assumption about the distribution over problems at all. In particular, the theorems prove that {anti}-cross-validation (choosing among a set of candidate algorithms based on which has {worst} out-of-sample behavior) performs as well as cross-validation, unless one makes an assumption -- which has never been formalized -- about how the distribution over induction problems, on the one hand, is related to the set of algorithms one is choosing among using (anti-)cross validation, on the other. In addition, they establish strong caveats concerning the significance of the many results in the literature which establish the strength of a particular algorithm without assuming a particular distribution. They also motivate a ``dictionary'' between supervised learning and improve blackbox optimization, which allows one to ``translate'' techniques from supervised learning into the domain of blackbox optimization, thereby strengthening blackbox optimization algorithms. In addition to these topics, I also briefly discuss their implications for philosophy of science.

1.2STAT-MECHJan 7, 2020
Minimal entropy production due to constraints on rate matrix dependencies in multipartite processes

David H Wolpert

I consider multipartite processes in which there are constraints on each subsystem's rate matrix, restricting which other subsystems can directly affect its dynamics. I derive a strictly nonzero lower bound on the minimal achievable entropy production rate of the process in terms of these constraints on the rate matrices of its subsystems. The bound is based on constructing counterfactual rate matrices, in which some subsystems are held fixed while the others are allowed to evolve. This bound is related to the "learning rate" of stationary bipartite systems, and more generally to the "information flow" in bipartite systems.

2.3STAT-MECHNov 7, 2019
Uncertainty relations and fluctuation theorems for Bayes nets

David H. Wolpert

Recent research has considered the stochastic thermodynamics of multiple interacting systems, representing the overall system as a Bayes net. I derive fluctuation theorems governing the entropy production (EP)of arbitrary sets of the systems in such a Bayes net. I also derive ``conditional'' fluctuation theorems, governing the distribution of EP in one set of systems conditioned on the EP of a different set of systems. I then derive thermodynamic uncertainty relations relating the EP of the overall system to the precisions of probability currents within the individual systems.

19.7AISep 19, 2017Code
Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

Kunal Menda, Yi-Chun Chen, Justin Grana et al.

The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.

27.5ITMay 6, 2017Code
Nonlinear Information Bottleneck

Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert

Information bottleneck (IB) is a technique for extracting information in one random variable $X$ that is relevant for predicting another random variable $Y$. IB works by encoding $X$ in a compressed "bottleneck" random variable $M$ from which $Y$ can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete $X$ and $Y$ with small state spaces, and continuous $X$ and $Y$ with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous $X$ and $Y$, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed "variational IB" method on several real-world datasets.

3.1CRSep 1, 2016
A Likelihood Ratio Detector for Identifying Within-Perimeter Computer Network Attacks

Justin Grana, David Wolpert, Joshua Neil et al.

The rapid detection of attackers within firewalls of enterprise computer net- works is of paramount importance. Anomaly detectors address this problem by quantifying deviations from baseline statistical models of normal network behav- ior and signaling an intrusion when the observed data deviates significantly from the baseline model. However, many anomaly detectors do not take into account plausible attacker behavior. As a result, anomaly detectors are prone to a large number of false positives due to unusual but benign activity. This paper first in- troduces a stochastic model of attacker behavior which is motivated by real world attacker traversal. Then, we develop a likelihood ratio detector that compares the probability of observed network behavior under normal conditions against the case when an attacker has possibly compromised a subset of hosts within the network. Since the likelihood ratio detector requires integrating over the time each host be- comes compromised, we illustrate how to use Monte Carlo methods to compute the requisite integral. We then present Receiver Operating Characteristic (ROC) curves for various network parameterizations that show for any rate of true posi- tives, the rate of false positives for the likelihood ratio detector is no higher than that of a simple anomaly detector and is often lower. We conclude by demon- strating the superiority of the proposed likelihood ratio detector when the network topologies and parameterizations are extracted from real-world networks.

2.5MLJun 7, 2016
Reducing the error of Monte Carlo Algorithms by Learning Control Variates

Brendan D. Tracey, David H. Wolpert

Monte Carlo (MC) sampling algorithms are an extremely widely-used technique to estimate expectations of functions f(x), especially in high dimensions. Control variates are a very powerful technique to reduce the error of such estimates, but in their conventional form rely on having an accurate approximation of f, a priori. Stacked Monte Carlo (StackMC) is a recently introduced technique designed to overcome this limitation by fitting a control variate to the data samples themselves. Done naively, forming a control variate to the data would result in overfitting, typically worsening the MC algorithm's performance. StackMC uses in-sample / out-sample techniques to remove this overfitting. Crucially, it is a post-processing technique, requiring no additional samples, and can be applied to data generated by any MC estimator. Our preliminary experiments demonstrated that StackMC improved the estimates of expectations when it was used to post-process samples produces by a "simple sampling" MC estimator. Here we substantially extend this earlier work. We provide an in-depth analysis of the StackMC algorithm, which we use to construct an improved version of the original algorithm, with lower estimation error. We then perform experiments of StackMC on several additional kinds of MC estimators, demonstrating improved performance when the samples are generated via importance sampling, Latin-hypercube sampling and quasi-Monte Carlo sampling. We also show how to extend StackMC to combine multiple fitting functions, and how to apply it to discrete input spaces x.

5.9ITSep 25, 2014
Optimal high-level descriptions of dynamical systems

David H. Wolpert, Joshua A. Grochow, Eric Libby et al.

To analyze high-dimensional systems, many fields in science and engineering rely on high-level descriptions, sometimes called "macrostates," "coarse-grainings," or "effective theories". Examples of such descriptions include the thermodynamic properties of a large collection of point particles undergoing reversible dynamics, the variables in a macroeconomic model describing the individuals that participate in an economy, and the summary state of a cell composed of a large set of biochemical networks. Often these high-level descriptions are constructed without considering the ultimate reason for needing them in the first place. Here, we formalize and quantify one such purpose: the need to predict observables of interest concerning the high-dimensional system with as high accuracy as possible, while minimizing the computational cost of doing so. The resulting State Space Compression (SSC) framework provides a guide for how to solve for the {optimal} high-level description of a given dynamical system, rather than constructing it based on human intuition alone. In this preliminary report, we introduce SSC, and illustrate it with several information-theoretic quantifications of "accuracy", all with different implications for the optimal compression. We also discuss some other possible applications of SSC beyond the goal of accurate prediction. These include SSC as a measure of the complexity of a dynamical system, and as a way to quantify information flow between the scales of a system.

5.4AIAug 9, 2014
Predicting the behavior of interacting humans by fusing data from multiple sources

Erik J. Schlicht, Ritchie Lee, David H. Wolpert et al.

Multi-fidelity methods combine inexpensive low-fidelity simulations with costly but highfidelity simulations to produce an accurate model of a system of interest at minimal cost. They have proven useful in modeling physical systems and have been applied to engineering problems such as wing-design optimization. During human-in-the-loop experimentation, it has become increasingly common to use online platforms, like Mechanical Turk, to run low-fidelity experiments to gather human performance data in an efficient manner. One concern with these experiments is that the results obtained from the online environment generalize poorly to the actual domain of interest. To address this limitation, we extend traditional multi-fidelity approaches to allow us to combine fewer data points from high-fidelity human-in-the-loop experiments with plentiful but less accurate data from low-fidelity experiments to produce accurate models of how humans interact. We present both model-based and model-free methods, and summarize the predictive performance of each method under dierent conditions.

2.4AIJun 26, 2012
Predicting the behavior of interacting humans by fusing data from multiple sources

Erik J. Schlicht, Ritchie Lee, David H. Wolpert et al.

Multi-fidelity methods combine inexpensive low-fidelity simulations with costly but high-fidelity simulations to produce an accurate model of a system of interest at minimal cost. They have proven useful in modeling physical systems and have been applied to engineering problems such as wing-design optimization. During human-in-the-loop experimentation, it has become increasingly common to use online platforms, like Mechanical Turk, to run low-fidelity experiments to gather human performance data in an efficient manner. One concern with these experiments is that the results obtained from the online environment generalize poorly to the actual domain of interest. To address this limitation, we extend traditional multi-fidelity approaches to allow us to combine fewer data points from high-fidelity human-in-the-loop experiments with plentiful but less accurate data from low-fidelity experiments to produce accurate models of how humans interact. We present both model-based and model-free methods, and summarize the predictive performance of each method under different conditions.