ML LGOct 22, 2025

On the hardness of RL with Lookahead

Corentin Pla, Hugo Richard, Marc Abeille, Nadav Merlis, Vianney Perchet

arXiv:2510.19372v17.81 citationsh-index: 23

Originality Highly original

AI Analysis

This addresses computational complexity challenges in RL for researchers, providing foundational insights into when predictive information is feasible to use optimally.

The paper tackles the problem of reinforcement learning with transition look-ahead, showing that optimal planning with one-step look-ahead is solvable in polynomial time, but becomes NP-hard for two or more steps, establishing a clear tractability boundary.

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

View on arXiv PDF

Similar