Policy Learning for Malaria Control
This work addresses a domain-specific challenge in public health by applying reinforcement learning to malaria control with incremental improvements in handling data scarcity.
The authors tackled the problem of learning optimal malaria control policies with very limited observational data, achieving a 7th place ranking in the KDD Cup 2019 challenge using Q-learning with sequence breaking.
Sequential decision making is a typical problem in reinforcement learning with plenty of algorithms to solve it. However, only a few of them can work effectively with a very small number of observations. In this report, we introduce the progress to learn the policy for Malaria Control as a Reinforcement Learning problem in the KDD Cup Challenge 2019 and propose diverse solutions to deal with the limited observations problem. We apply the Genetic Algorithm, Bayesian Optimization, Q-learning with sequence breaking to find the optimal policy for five years in a row with only 20 episodes/100 evaluations. We evaluate those algorithms and compare their performance with Random Search as a baseline. Among these algorithms, Q-Learning with sequence breaking has been submitted to the challenge and got ranked 7th in KDD Cup.