LG AI DS MLJul 13, 2023

The complexity of non-stationary reinforcement learning

arXiv:2307.06877v18.88 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses the problem of computational inefficiency in adapting reinforcement learning models to changes, which is crucial for real-world applications with large state spaces, though it is incremental in providing theoretical insights rather than practical solutions.

The paper tackles the challenge of continual learning in non-stationary reinforcement learning by proving a worst-case complexity result: updating the value function after modifying a single state-action pair requires time nearly proportional to the number of states, unless the strong exponential time hypothesis (SETH) is false, while adding a new state-action pair is shown to be easier.

The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.

View on arXiv PDF

Similar