Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management
This work addresses efficiency and adaptability in care coordination for Medicaid and safety-net populations, but it is incremental as it builds on existing offline RL methods with specific enhancements.
The paper tackled the problem of optimizing care coordination and population health management by proposing a lightweight offline reinforcement learning approach that uses test-time learning and inference-time deliberation to balance efficiency and adaptability. The result was stable value estimates with predictable efficiency trade-offs and subgroup auditing on an operational dataset.
Care coordination and population health management programs serve large Medicaid and safety-net populations and must be auditable, efficient, and adaptable. While clinical risk for outreach modalities is typically low, time and opportunity costs differ substantially across text, phone, video, and in-person visits. We propose a lightweight offline reinforcement learning (RL) approach that augments trained policies with (i) test-time learning via local neighborhood calibration, and (ii) inference-time deliberation via a small Q-ensemble that incorporates predictive uncertainty and time/effort cost. The method exposes transparent dials for neighborhood size and uncertainty/cost penalties and preserves an auditable training pipeline. Evaluated on a de-identified operational dataset, TTL+ITD achieves stable value estimates with predictable efficiency trade-offs and subgroup auditing.