Using Echo State Networks to Approximate Value Functions for Control
This addresses the challenge of efficient reinforcement learning in non-Markovian settings, offering a novel algorithmic basis, though it appears incremental as it builds on existing ESN methods.
The paper tackles the problem of approximating value functions for non-Markovian control problems by proving that Echo State Networks (ESNs) can achieve this under mild conditions, and demonstrates with examples like 'Bee World' and a market-making problem that the algorithms yield good policies after a single reinforcement policy iteration.
An Echo State Network (ESN) is a type of single-layer recurrent neural network with randomly-chosen internal weights and a trainable output layer. We prove under mild conditions that a sufficiently large Echo State Network can approximate the value function of a broad class of stochastic and deterministic control problems. Such control problems are generally non-Markovian. We describe how the ESN can form the basis for novel and computationally efficient reinforcement learning algorithms in a non-Markovian framework. We demonstrate this theory with two examples. In the first, we use an ESN to solve a deterministic, partially observed, control problem which is a simple game we call `Bee World'. In the second example, we consider a stochastic control problem inspired by a market making problem in mathematical finance. In both cases we can compare the dynamics of the algorithms with analytic solutions to show that even after only a single reinforcement policy iteration the algorithms arrive at a good policy.