LG MLJun 4, 2024

Reinforcement Learning with Lookahead Information

arXiv:2406.02258v210.46 citations

Originality Incremental advance

AI Analysis

This addresses a gap in RL for unknown environments with lookahead information, which is incremental as it extends existing methods to handle observations more effectively.

The paper tackles the problem of reinforcement learning with lookahead information, where agents observe rewards or transitions before acting, and designs provably-efficient algorithms that incorporate this information to achieve tight regret and linearly increase collected reward compared to agents without such capabilities.

We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including transactions, navigation and more. When the environment is known, previous work shows that this lookahead information can drastically increase the collected reward. However, outside of specific applications, existing approaches for interacting with unknown environments are not well-adapted to these observations. In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information - linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.

View on arXiv PDF

Similar