LGApr 1, 2024
Data-Driven Knowledge Transfer in Batch $Q^*$ LearningElynn Chen, Xi Chen, Wenbo Jing
In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q^*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q^*$ function is significantly improved from the single task rate both theoretically and empirically.
LGFeb 1, 2025
High-Dimensional Linear Bandits under Stochastic Latent HeterogeneityElynn Chen, Xi Chen, Wenbo Jing et al.
This paper addresses the critical challenge of stochastic latent heterogeneity in online decision-making, where individuals' responses to actions vary not only with observable contexts but also with unobserved, randomly realized subgroups. Existing data-driven approaches largely capture observable heterogeneity through contextual features but fail when the sources of variation are latent and stochastic. We propose a latent heterogeneous bandit framework that explicitly models probabilistic subgroup membership and group-specific reward functions, using promotion targeting as a motivating example. Our phased EM-greedy algorithm jointly learns latent group probabilities and reward parameters in high dimensions, achieving optimal estimation and classification guarantees. Our analysis reveals a new phenomenon unique to decision-making with stochastic latent subgroups: randomness in group realizations creates irreducible classification uncertainty, making sub-linear regret against a fully informed strong oracle fundamentally impossible. We establish matching upper and minimax lower bounds for both the strong and regular regrets, corresponding, respectively, to oracles with and without access to realized group memberships. The strong regret necessarily grows linearly, while the regular regret achieves a minimax-optimal sublinear rate. These findings uncover a fundamental stochastic barrier in online decision-making and point to potential remedies through simple strategic interventions and mechanism-design-based elicitation of latent information.