Learning Sequential Decisions from Multiple Sources via Group-Robust Markov Decision Processes
This work addresses robust planning for multi-site data in domains like healthcare, but it is incremental as it builds on existing distributionally robust MDPs with a novel structural twist.
The paper tackles the problem of learning robust sequential decision-making policies from offline, multi-site datasets with heterogeneity, by introducing a group-robust Markov decision process framework with feature-wise uncertainty sets and an offline algorithm, resulting in a provable suboptimality bound under a robust partial coverage assumption.
We often collect data from multiple sites (e.g., hospitals) that share common structure but also exhibit heterogeneity. This paper aims to learn robust sequential decision-making policies from such offline, multi-site datasets. To model cross-site uncertainty, we study distributionally robust MDPs with a group-linear structure: all sites share a common feature map, and both the transition kernels and expected reward functions are linear in these shared features. We introduce feature-wise (d-rectangular) uncertainty sets, which preserve tractable robust Bellman recursions while maintaining key cross-site structure. Building on this, we then develop an offline algorithm based on pessimistic value iteration that includes: (i) per-site ridge regression for Bellman targets, (ii) feature-wise worst-case (row-wise minimization) aggregation, and (iii) a data-dependent pessimism penalty computed from the diagonals of the inverse design matrices. We further propose a cluster-level extension that pools similar sites to improve sample efficiency, guided by prior knowledge of site similarity. Under a robust partial coverage assumption, we prove a suboptimality bound for the resulting policy. Overall, our framework addresses multi-site learning with heterogeneous data sources and provides a principled approach to robust planning without relying on strong state-action rectangularity assumptions.