OC LG MLMar 7, 2020

Convergence of Q-value in case of Gaussian rewards

Konatsu Miyamoto, Masaya Suzuki, Yuma Kigami, Kodai Satake

arXiv:2003.03526v11 citations

Originality Synthesis-oriented

AI Analysis

This addresses a theoretical gap for reinforcement learning applications where rewards follow Gaussian distributions, such as in distributional or Bayesian RL, though it appears incremental to existing convergence proofs.

The paper proves convergence of Q-functions for reinforcement learning with Gaussian-distributed rewards, establishing convergence under the relaxed condition E[r(s,a)^2] < ∞, which is less restrictive than previous requirements.

In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of $E[r(s,a)^2]<\infty$, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted.

View on arXiv PDF

Similar