Functional Sequential Treatment Allocation with Covariates
This work addresses a theoretical challenge in decision-making under uncertainty for applications like socio-economic policy, but it appears incremental as it extends existing bandit frameworks to more general functionals.
The paper tackles the problem of sequential treatment allocation with covariates in multi-armed bandits, where the goal is to maximize a general functional of the conditional outcome distribution rather than just the mean, and it develops expected regret lower bounds and a near minimax optimal policy.
We consider a multi-armed bandit problem with covariates. Given a realization of the covariate vector, instead of targeting the treatment with highest conditional expectation, the decision maker targets the treatment which maximizes a general functional of the conditional potential outcome distribution, e.g., a conditional quantile, trimmed mean, or a socio-economic functional such as an inequality, welfare or poverty measure. We develop expected regret lower bounds for this problem, and construct a near minimax optimal assignment policy.