A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety
This work addresses AI safety by redefining black swan events to include human irrationality, potentially impacting risk assessment in unchanging environments.
The paper challenges the standard view that black swan events only arise from unpredictable environments, arguing they can also occur in unchanging environments due to human misperception of value and likelihood, which it terms spatial black swan events, and mathematically formalizes definitions to guide algorithm development for prevention.
Black swan events are statistically rare occurrences that carry extremely high risks. A typical view of defining black swan events is heavily assumed to originate from an unpredictable time-varying environments; however, the community lacks a comprehensive definition of black swan events. To this end, this paper challenges that the standard view is incomplete and claims that high-risk, statistically rare events can also occur in unchanging environments due to human misperception of their value and likelihood, which we call as spatial black swan event. We first carefully categorize black swan events, focusing on spatial black swan events, and mathematically formalize the definition of black swan events. We hope these definitions can pave the way for the development of algorithms to prevent such events by rationally correcting human perception.