ML LGJun 20, 2017

Statistical Mechanics of Node-perturbation Learning with Noisy Baseline

Kazuyuki Hara, Kentaro Katahira, Masato Okada

arXiv:1706.06953v11 citations

Originality Synthesis-oriented

AI Analysis

This provides theoretical analysis of a gradient estimation method for reinforcement learning problems where objective functions aren't explicitly formulated.

The authors developed a statistical mechanics framework for node-perturbation learning with noisy baselines, deriving coupled differential equations to describe learning dynamics and generalization error. They demonstrated that Cho's model applies generally and characterized its performance.

Node-perturbation learning is a type of statistical gradient descent algorithm that can be applied to problems where the objective function is not explicitly formulated, including reinforcement learning. It estimates the gradient of an objective function by using the change in the object function in response to the perturbation. The value of the objective function for an unperturbed output is called a baseline. Cho et al. proposed node-perturbation learning with a noisy baseline. In this paper, we report on building the statistical mechanics of Cho's model and on deriving coupled differential equations of order parameters that depict learning dynamics. We also show how to derive the generalization error by solving the differential equations of order parameters. On the basis of the results, we show that Cho's results are also apply in general cases and show some general performances of Cho's model.

View on arXiv PDF

Similar