ML LG EMFeb 17, 2023

Post Reinforcement Learning Inference

arXiv:2302.08854v59.84 citationsh-index: 39Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for reliable inference in reinforcement learning applications, such as healthcare or policy evaluation, by providing a method to handle adaptive data, though it is incremental in extending existing moment-based techniques to RL settings.

The paper tackles the problem of performing statistical inference on data collected by reinforcement learning algorithms, which is challenging due to adaptive and nonstationary data collection. It proposes a weighted generalized method of moments approach that stabilizes time-varying variance, enabling consistent and asymptotically normal estimators for tasks like dynamic treatment effect estimation.

We study estimation and inference using data collected by reinforcement learning (RL) algorithms. These algorithms adaptively experiment by interacting with individual units over multiple stages, updating their strategies based on past outcomes. Our goal is to evaluate a counterfactual policy after data collection and estimate structural parameters, such as dynamic treatment effects, that support credit assignment and quantify the impact of early actions on final outcomes. These parameters can often be defined as solutions to moment equations, motivating moment-based estimation methods developed for static data. In RL settings, however, data are often collected adaptively under nonstationary behavior policies. As a result, standard estimators fail to achieve asymptotic normality due to time-varying variance. We propose a weighted generalized method of moments (GMM) approach that uses adaptive weights to stabilize this variance. We characterize weighting schemes that ensure consistency and asymptotic normality of the weighted GMM estimators, enabling valid hypothesis testing and uniform confidence region construction. Key applications include dynamic treatment effect estimation and dynamic off-policy evaluation.

View on arXiv PDF Code

Similar