ML LG EM OCMar 6, 2021

Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity

Jin Li, Ye Luo, Zigan Wang, Xiaowei Zhang

arXiv:2103.04021v311.810 citations

Originality Incremental advance

AI Analysis

This work addresses endogeneity issues in reinforcement learning for decision-makers in dynamic environments, representing an incremental advancement by extending IV methods to RL contexts.

The paper tackles the reinforcement bias that arises from the dynamic interaction between data generation and analysis in decision-making processes, proposing IV-based reinforcement learning algorithms to correct this bias and establishing their theoretical properties with inference formulas for optimal policies.

In the standard data analysis framework, data is collected (once and for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their theoretical properties by incorporating them into a stochastic approximation (SA) framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement. We also provide formulas for inference on optimal policies of the IV-RL algorithms. These formulas highlight how intertemporal dependencies of the Markovian environment affect the inference.

View on arXiv PDF

Similar