MLLGMar 24, 2023

Sequential Knockoffs for Variable Selection in Reinforcement Learning

arXiv:2303.14281v28 citationsh-index: 34
Originality Incremental advance
AI Analysis

This addresses the challenge of state representation in reinforcement learning for real-world applications, offering a method to improve policy learning efficiency, though it is incremental as it builds on existing knockoff techniques.

The paper tackles the problem of identifying a minimal sufficient state representation in reinforcement learning to avoid unnecessary high-dimensional states that slow learning, and introduces the SEEK algorithm which achieves selection consistency in large samples and outperforms competing methods in variable selection accuracy and regret.

In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state may slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the subvector of the original state under which the process remains an MDP and shares the same reward function as the original process. We propose a novel SEquEntial Knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method achieves selection consistency. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy learning. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods regarding variable selection accuracy and regret.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes