LGAISep 26, 2022

Delayed Geometric Discounts: An Alternative Criterion for Reinforcement Learning

arXiv:2209.12483v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses sample-inefficiency and exploration challenges in RL for tasks with non-exponential future returns, though it appears incremental as it generalizes existing formulations.

The paper tackled the limitation of geometric discounts in reinforcement learning by introducing delayed objective functions, which solved hard exploration problems in tabular environments and improved sample-efficiency on simulated robotics benchmarks.

The endeavor of artificial intelligence (AI) is to design autonomous agents capable of achieving complex tasks. Namely, reinforcement learning (RL) proposes a theoretical background to learn optimal behaviors. In practice, RL algorithms rely on geometric discounts to evaluate this optimality. Unfortunately, this does not cover decision processes where future returns are not exponentially less valuable. Depending on the problem, this limitation induces sample-inefficiency (as feed-backs are exponentially decayed) and requires additional curricula/exploration mechanisms (to deal with sparse, deceptive or adversarial rewards). In this paper, we tackle these issues by generalizing the discounted problem formulation with a family of delayed objective functions. We investigate the underlying RL problem to derive: 1) the optimal stationary solution and 2) an approximation of the optimal non-stationary control. The devised algorithms solved hard exploration problems on tabular environment and improved sample-efficiency on classic simulated robotics benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes