AI LG MLApr 27, 2022

Counterfactual harm

Jonathan G. Richens, Rory Beard, Daniel H. Thompson

arXiv:2204.12993v524.438 citationsh-index: 10

Originality Highly original

AI Analysis

This addresses the critical issue of ethical and safe decision-making for AI agents, particularly in high-stakes domains like healthcare, though it is foundational rather than incremental.

The paper tackles the problem of measuring and avoiding harm in algorithmic decision-making by proposing the first formal definition of harm and benefit using causal models, and demonstrates that their counterfactual approach identifies drug doses that are significantly less harmful without sacrificing efficacy compared to standard methods.

To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomized control trial data. We find that the standard method of selecting doses using treatment effects results in unnecessarily harmful doses, while our counterfactual approach allows us to identify doses that are significantly less harmful without sacrificing efficacy.

View on arXiv PDF

Similar