Dynamically Optimal Treatment Allocation
This work addresses dynamic decision-making in economic policy, offering a method to improve welfare in domains like job training, though it appears incremental as it builds on existing reinforcement learning advances.
The paper tackles the problem of personalized dynamic treatment allocation under budget constraints by leveraging existing randomized control trial data, proving that regret decays at a rate of n^(-0.5) and achieving significantly higher welfare in job training applications compared to static methods.
Dynamic decisions are pivotal to economic policy making. We show how existing evidence from randomized control trials can be utilized to guide personalized decisions in challenging dynamic environments with budget and capacity constraints. Recent advances in reinforcement learning now enable the solution of many complex, real-world problems for the first time. We allow for restricted classes of policy functions and prove that their regret decays at rate n^(-0.5), the same as in the static case. Applying our methods to job training, we find that by exploiting the problem's dynamic structure, we achieve significantly higher welfare compared to static approaches.