LGJun 3, 2021

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

arXiv:2106.01516v1
Originality Incremental advance
AI Analysis

This work addresses reinforcement learning efficiency for AI systems, but it appears incremental as it builds on existing discounting and reward-punishment concepts.

The paper tackles the problem of improving reinforcement learning by introducing hyperbolic discounting within a reward-punishment framework, resulting in a new scheme that outperforms standard methods in simulations, though performance varies with reward-punishment design.

This paper proposes a new reinforcement learning with hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t. reward and punishment are different from each other, like a sign effect in animal behaviors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes