LGMay 10

Learning to Compress Time-to-Control: A Reinforcement Learning Framework for Chronic Disease Management

arXiv:2605.0981821.8
Predicted impact top 73% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For healthcare RL researchers, this work addresses chronic disease management as a more tractable setting than acute care, but results are limited to synthetic simulations.

The paper proposes a reinforcement learning framework for chronic disease management that compresses time-to-control using a tiered reward and a two-loop architecture coupling preference learning with RL. Simulation results on hypertension and type 2 diabetes show capability-weighted offline RL outperforms uniform-weighted offline RL by 15 percentage points on T2D TTC.

Reinforcement learning (RL) in healthcare has had mixed results, with reward sparsity, unreliable off-policy evaluation, and deployment-simulation gap as recurring failure modes. We argue that chronic disease management is structurally a more tractable RL setting than the acute-care problems the field has primarily studied, but only if the problem is formalized to exploit chronic care's properties. We propose such a formalization. The agent's objective is to compress time-to-control (TTC) under a tiered reward calibrated to the CMS ACCESS Model. Two quantities from our companion preference-learning paper [Singh et al. 2026] enter as load-bearing structural elements: the execution intensity εbounds action availability under a constrained Markov Decision Process, and the clinician capability κweights offline-data transitions during RL training. Together they couple preference learning and RL into a two-loop architecture. We present simulation results on synthetic state machines for hypertension and type 2 diabetes. Capability-weighted offline RL outperforms uniform-weighted offline RL and the behavior policy by 15 percentage points on T2D TTC; the uniform-weighted formulation (the standard in existing healthcare RL) underperforms even the heterogeneous behavior policy. \Epsilon-aware policies generalize across deployment regimes while ε-naive policies do not.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes