LGMLJun 15, 2021

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

arXiv:2106.08477v1
Originality Incremental advance
AI Analysis

This work addresses uncertainty in RL for information processing scenarios, but it is incremental as it builds on existing methods with specific theoretical gains.

The paper tackles the problem of online reinforcement learning in Markov decision processes with both endogenous and exogenous uncertainty, where rewards and state transitions are unknown and time-varying, by developing a variation-aware algorithm that achieves a regret bound improvement of up to √S or S^(1/6)T^(1/12) compared to prior results.

Online reinforcement learning (RL) has been widely applied in information processing scenarios, which usually exhibit much uncertainty due to the intrinsic randomness of channels and service demands. In this paper, we consider an un-discounted RL in general Markov decision processes (MDPs) with both endogeneous and exogeneous uncertainty, where both the rewards and state transition probability are unknown to the RL agent and evolve with the time as long as their respective variations do not exceed certain dynamic budget (i.e., upper bound). We first develop a variation-aware Bernstein-based upper confidence reinforcement learning (VB-UCRL), which we allow to restart according to a schedule dependent on the variations. We successfully overcome the challenges due to the exogeneous uncertainty and establish a regret bound of saving at most $\sqrt{S}$ or $S^{\frac{1}{6}}T^{\frac{1}{12}}$ compared with the latest results in the literature, where $S$ denotes the state size of the MDP and $T$ indicates the iteration index of learning steps.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes