RM LG MLNov 12, 2019

An Unethical Optimization Principle

Nicholas Beale, Heather Battey, Anthony C. Davison, Robert S. MacKay

arXiv:1911.05116v12.37 citations

Originality Incremental advance

AI Analysis

This work addresses a foundational issue in AI safety and ethics, with implications for policy and strategy detection, though it is incremental in formalizing an existing concern.

The paper tackles the problem of AI systems disproportionately selecting unethical strategies when optimizing for risk-adjusted returns, showing that the probability of choosing an unethical strategy tends to unity as the strategy space grows large unless returns are fat-tailed.

If an artificial intelligence aims to maximise risk-adjusted return, then under mild conditions it is disproportionately likely to pick an unethical strategy unless the objective function allows sufficiently for this risk. Even if the proportion $η$ of available unethical strategies is small, the probability ${p_U}$ of picking an unethical strategy can become large; indeed unless returns are fat-tailed ${p_U}$ tends to unity as the strategy space becomes large. We define an Unethical Odds Ratio Upsilon ($Υ$) that allows us to calculate ${p_U}$ from $η$, and we derive a simple formula for the limit of $Υ$ as the strategy space becomes large. We give an algorithm for estimating $Υ$ and ${p_U}$ in finite cases and discuss how to deal with infinite strategy spaces. We show how this principle can be used to help detect unethical strategies and to estimate $η$. Finally we sketch some policy implications of this work.

View on arXiv PDF

Similar