Insulin Regimen ML-based control for T2DM patients
This work addresses personalized insulin regimen optimization for T2DM patients, but it appears incremental as it applies standard MDP and RL methods to this domain.
The authors tackled the problem of controlling blood glucose levels in Type 2 Diabetes Mellitus (T2DM) patients by modeling it as a Markov Decision Process (MDP) and using model-based reinforcement learning to derive an optimal insulin treatment policy, aiming to maximize a reward function for healthy glucose levels.
\begin{abstract} We model individual T2DM patient blood glucose level (BGL) by stochastic process with discrete number of states mainly but not solely governed by medication regimen (e.g. insulin injections). BGL states change otherwise according to various physiological triggers which render a stochastic, statistically unknown, yet assumed to be quasi-stationary, nature of the process. In order to express incentive for being in desired healthy BGL we heuristically define a reward function which returns positive values for desirable BG levels and negative values for undesirable BG levels. The state space consists of sufficient number of states in order to allow for memoryless assumption. This, in turn, allows to formulate Markov Decision Process (MDP), with an objective to maximize the total reward, summarized over a long run. The probability law is found by model-based reinforcement learning (RL) and the optimal insulin treatment policy is retrieved from MDP solution.