Online Statistical Inference in Decision-Making with Matrix Context
This work addresses a gap in online decision-making by enabling statistical inference for low-rank matrix contexts, which is incremental as it builds on existing reward maximization algorithms but adds inference capabilities.
The paper tackles the problem of conducting statistical inference in online decision-making with matrix contexts, where model parameters have a low-rank structure, by proposing an online debiasing procedure that handles biases from low-rank estimation and adaptive data collection, resulting in asymptotically normal estimators and valid confidence intervals for parameters and optimal policy values.
The study of online decision-making problems that leverage contextual information has drawn notable attention due to their significant applications in fields ranging from healthcare to autonomous systems. In modern applications, contextual information can be rich and is often represented as a matrix. Moreover, while existing online decision algorithms mainly focus on reward maximization, less attention has been devoted to statistical inference. To address these gaps, in this work, we consider an online decision-making problem with a matrix context where the true model parameters have a low-rank structure. We propose a fully online procedure to conduct statistical inference with adaptively collected data. The low-rank structure of the model parameter and the adaptive nature of the data collection process make this difficult: standard low-rank estimators are biased and cannot be obtained in a sequential manner while existing inference approaches in sequential decision-making algorithms fail to account for the low-rankness and are also biased. To overcome these challenges, we introduce a new online debiasing procedure to simultaneously handle both sources of bias. Our inference framework encompasses both parameter inference and optimal policy value inference. In theory, we establish the asymptotic normality of the proposed online debiased estimators and prove the validity of the constructed confidence intervals for both inference tasks. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its convergence result, which are also of independent interest.