James M. Hyman

48.8NTMar 18

Why Eight Percent of Benford Sequences Never Converge

James M. Hyman

We study multi-digit correlations in Benford sequences b^n for integer bases 2 <= b <= 1000, measuring dependence via conditional mutual information (CMI). A resonance ratio derived from the continued fraction expansion of log_10(b) classifies bases into convergent and persistent regimes (Theorem 3.13): among 996 bases surveyed, 84 (8.4%) exhibit persistent correlations at sample depth N = 10,000, and extended computation to N = 200,000 confirms 53 (5.3%) as genuinely persistent. We prove that CMI deviation is bounded by the distribution error (Theorem 3.4); exhaustive computation across 2,988 test cases confirms that the effective scaling is quadratic, yielding a two-sided rate beta = 2 for bounded-type bases (conditional on a computationally verified Hessian positivity condition). The observed effective exponent across 774 convergent bases is beta_eff = 1.72 +/- 0.19, consistent with finite-sample corrections to the asymptotic rate. We conjecture that the persistence rate converges to 1/12, a prediction grounded in the Gauss-Kuzmin distribution of partial quotients. For persistent bases, the convergence threshold N_epsilon exceeds 10^6 at standard precision, rendering the asymptotic limit observationally irrelevant within our computational scope.

MLMar 4, 2015

Quantifying Uncertainty in Stochastic Models with Parametric Variability

Kyle S. Hickmann, James M. Hyman, Sara Y. Del Valle

We present a method to quantify uncertainty in the predictions made by simulations of mathematical models that can be applied to a broad class of stochastic, discrete, and differential equation models. Quantifying uncertainty is crucial for determining how accurate the model predictions are and identifying which input parameters affect the outputs of interest. Most of the existing methods for uncertainty quantification require many samples to generate accurate results, are unable to differentiate where the uncertainty is coming from (e.g., parameters or model assumptions), or require a lot of computational resources. Our approach addresses these challenges and opportunities by allowing different types of uncertainty, that is, uncertainty in input parameters as well as uncertainty created through stochastic model components. This is done by combining the Karhunen-Loeve decomposition, polynomial chaos expansion, and Bayesian Gaussian process regression to create a statistical surrogate for the stochastic model. The surrogate separates the analysis of variation arising through stochastic simulation and variation arising through uncertainty in the model parameterization. We illustrate our approach by quantifying the uncertainty in a stochastic ordinary differential equation epidemic model. Specifically, we estimate four quantities of interest for the epidemic model and show agreement between the surrogate and the actual model results.

James M. Hyman

2 Papers