AILGDec 30, 2024

Predicting Long Term Sequential Policy Value Using Softer Surrogates

arXiv:2412.20638v2h-index: 54
Originality Incremental advance
AI Analysis

This addresses a key challenge in domains like healthcare, where novel treatments require costly long-term trials, though it is incremental as it builds on existing surrogacy conditions and OPE frameworks.

The paper tackles the problem of predicting long-term policy outcomes when new policies introduce novel actions, which existing off-policy evaluation methods cannot handle. In simulated healthcare examples for HIV and sepsis management, their estimators accurately predict policy value after observing only 10% of the full horizon data.

Off-policy policy evaluation (OPE) estimates the outcome of a new policy using historical data collected from a different policy. However, existing OPE methods cannot handle cases when the new policy introduces novel actions. This issue commonly occurs in real-world domains, like healthcare, as new drugs and treatments are continuously developed. Novel actions necessitate on-policy data collection, which can be burdensome and expensive if the outcome of interest takes a substantial amount of time to observe--for example, in multi-year clinical trials. This raises a key question of how to predict the long-term outcome of a policy after only observing its short-term effects? Though in general this problem is intractable, under some surrogacy conditions, the short-term on-policy data can be combined with the long-term historical data to make accurate predictions about the new policy's long-term value. In two simulated healthcare examples--HIV and sepsis management--we show that our estimators can provide accurate predictions about the policy value only after observing 10\% of the full horizon data. We also provide finite sample analysis of our doubly robust estimators.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes