Detecting critical treatment effect bias in small subgroups
This addresses the issue of bias in observational studies for medical decision-making, offering a method to ensure reliability, though it is incremental as it builds on existing benchmarking approaches.
The paper tackles the problem of benchmarking observational studies against randomized trials to detect treatment effect biases in small subgroups, proposing a statistical test and bias bound that were validated in a real-world setting with conclusions aligning with medical knowledge.
Randomized trials are considered the gold standard for making informed decisions in medicine, yet they often lack generalizability to the patient populations in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using an observational study for decision-making, it is crucial to benchmark its treatment effect estimates against those derived from a randomized trial. We propose a novel strategy to benchmark observational studies beyond the average treatment effect. First, we design a statistical test for the null hypothesis that the treatment effects estimated from the two studies, conditioned on a set of relevant features, differ up to some tolerance. We then estimate an asymptotically valid lower bound on the maximum bias strength for any subgroup in the observational study. Finally, we validate our benchmarking strategy in a real-world setting and show that it leads to conclusions that align with established medical knowledge.