Threshold Crossings as Tail Events for Catastrophic AI Risk
This addresses catastrophic AI risk monitoring and mitigation for AI safety researchers.
The paper analyzes how random fluctuations near catastrophic thresholds in AI systems can generate heavy-tailed outcome distributions, demonstrating that the probability of sudden large-scale transitions aligns closely with tail probabilities of damage distributions.
We analyse circumstances in which bifurcation-driven jumps in AI systems are associated with emergent heavy-tailed outcome distributions. By analysing how a control parameter's random fluctuations near a catastrophic threshold generate extreme outcomes, we demonstrate in what circumstances the probability of a sudden, large-scale, transition aligns closely with the tail probability of the resulting damage distribution. Our results contribute to research in monitoring, mitigation and control of AI systems when seeking to manage potentially catastrophic AI risk.