MLJun 10, 2022
Dynamic mean field programmingGeorge Stamatescu
A dynamic mean field theory is developed for finite state and action Bayesian reinforcement learning in the large state space limit. In an analogy with statistical physics, the Bellman equation is studied as a disordered dynamical system; the Markov decision process transition probabilities are interpreted as couplings and the value functions as deterministic spins that evolve dynamically. Thus, the mean-rewards and transition probabilities are considered to be quenched random variables. The theory reveals that, under certain assumptions, the state-action values are statistically independent across state-action pairs in the asymptotic state space limit, and provides the form of the distribution exactly. The results hold in the finite and discounted infinite horizon settings, for both value iteration and policy evaluation. The state-action value statistics can be computed from a set of mean field equations, which we call dynamic mean field programming (DMFP). For policy evaluation the equations are exact. For value iteration, approximate equations are obtained by appealing to extreme value theory or bounds. The result provides analytic insight into the statistical structure of tabular reinforcement learning, for example revealing the conditions under which reinforcement learning is equivalent to a set of independent multi-armed bandit problems.
MLFeb 1, 2019
Critical initialisation in continuous approximations of binary neural networksGeorge Stamatescu, Federica Gerace, Carlo Lucibello et al.
The training of stochastic neural network models with binary ($\pm1$) weights and activations via continuous surrogate networks is investigated. We derive new surrogates using a novel derivation based on writing the stochastic neural network as a Markov chain. This derivation also encompasses existing variants of the surrogates presented in the literature. Following this, we theoretically study the surrogates at initialisation. We derive, using mean field theory, a set of scalar equations describing how input signals propagate through the randomly initialised networks. The equations reveal whether so-called critical initialisations exist for each surrogate network, where the network can be trained to arbitrary depth. Moreover, we predict theoretically and confirm numerically, that common weight initialisation schemes used in standard continuous networks, when applied to the mean values of the stochastic binary weights, yield poor training performance. This study shows that, contrary to common intuition, the means of the stochastic binary weights should be initialised close to $\pm 1$, for deeper networks to be trainable.
CVMay 13, 2016
Track Extraction with Hidden Reciprocal Chain ModelsGeorge Stamatescu, Langford B White, Riley Bruce-Doust
This paper develops Bayesian track extraction algorithms for targets modelled as hidden reciprocal chains (HRC). HRC are a class of finite-state random process models that generalise the familiar hidden Markov chains (HMC). HRC are able to model the "intention" of a target to proceed from a given origin to a destination, behaviour which cannot be properly captured by a HMC. While Bayesian estimation problems for HRC have previously been studied, this paper focusses principally on the problem of track extraction, of which the primary task is confirming target existence in a set of detections obtained from thresholding sensor measurements. Simulation examples are presented which show that the additional model information contained in a HRC improves detection performance when compared to HMC models.