Shuze Chen

2papers

2 Papers

MEJul 29, 2024
Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality

Shuze Chen, David Simchi-Levi, Chonghuan Wang

Utilizing randomized experiments to evaluate the effect of short-term treatments on the short-term outcomes has been well understood and become the golden standard in industrial practice. However, as service systems become increasingly dynamical and personalized, much focus is shifting toward maximizing long-term outcomes, such as customer lifetime value, through lifetime exposure to interventions. Our goal is to assess the impact of treatment and control policies on long-term outcomes from relatively short-term observations, such as those generated by A/B testing. A key managerial observation is that many practical treatments are local, affecting only targeted states while leaving other parts of the policy unchanged. This paper rigorously investigates whether and how such locality can be exploited to improve estimation of long-term effects in Markov Decision Processes (MDPs), a fundamental model of dynamic systems. We first develop optimal inference techniques for general A/B testing in MDPs and establish corresponding efficiency bounds. We then propose methods to harness the localized structure by sharing information on the non-targeted states. Our new estimator can achieve a linear reduction with the number of test arms for a major part of the variance without sacrificing unbiasedness. It also matches a tighter variance lower bound that accounts for locality. Furthermore, we extend our framework to a broad class of differentiable estimators, which encompasses many widely used approaches in practice. We show that all such estimators can benefit from variance reduction through information sharing without increasing their bias. Together, these results provide both theoretical foundations and practical tools for conducting efficient experiments in dynamic service systems with local treatments.

LGJun 3, 2025
Multi-agent Markov Entanglement

Shuze Chen, Tianyi Peng

Value decomposition has long been a fundamental technique in multi-agent dynamic programming and reinforcement learning (RL). Specifically, the value function of a global state $(s_1,s_2,\ldots,s_N)$ is often approximated as the sum of local functions: $V(s_1,s_2,\ldots,s_N)\approx\sum_{i=1}^N V_i(s_i)$. This approach traces back to the index policy in restless multi-armed bandit problems and has found various applications in modern RL systems. However, the theoretical justification for why this decomposition works so effectively remains underexplored. In this paper, we uncover the underlying mathematical structure that enables value decomposition. We demonstrate that a multi-agent Markov decision process (MDP) permits value decomposition if and only if its transition matrix is not "entangled" -- a concept analogous to quantum entanglement in quantum physics. Drawing inspiration from how physicists measure quantum entanglement, we introduce how to measure the "Markov entanglement" for multi-agent MDPs and show that this measure can be used to bound the decomposition error in general multi-agent MDPs. Using the concept of Markov entanglement, we proved that a widely-used class of index policies is weakly entangled and enjoys a sublinear $\mathcal O(\sqrt{N})$ scale of decomposition error for $N$-agent systems. Finally, we show how Markov entanglement can be efficiently estimated in practice, providing practitioners with an empirical proxy for the quality of value decomposition.