Optimal Learning for Stochastic Optimization with Nonlinear Parametric Belief Models
This work addresses the challenge of computationally expensive sequential experimentation in stochastic optimization for researchers and practitioners in fields like operations research or machine learning, representing an incremental improvement by adapting sampled approximations for nonlinear models.
The paper tackles the problem of estimating the expected value of information for Bayesian learning with nonlinear parametric belief models, aiming to maximize a metric while learning parameters through sequential experimentation. It introduces a resampling method that ensures asymptotic convergence to true parameters and shows rapid empirical convergence with a small number of experiments.
We consider the problem of estimating the expected value of information (the knowledge gradient) for Bayesian learning problems where the belief model is nonlinear in the parameters. Our goal is to maximize some metric, while simultaneously learning the unknown parameters of the nonlinear belief model, by guiding a sequential experimentation process which is expensive. We overcome the problem of computing the expected value of an experiment, which is computationally intractable, by using a sampled approximation, which helps to guide experiments but does not provide an accurate estimate of the unknown parameters. We then introduce a resampling process which allows the sampled model to adapt to new information, exploiting past experiments. We show theoretically that the method converges asymptotically to the true parameters, while simultaneously maximizing our metric. We show empirically that the process exhibits rapid convergence, yielding good results with a very small number of experiments.