Provably safe systems: the only path to controllable AGI
This addresses the critical problem of ensuring safe AGI for humanity, but it is incremental as it builds on existing ideas in AI safety without presenting new empirical results.
The paper argues that building Artificial General Intelligences (AGIs) with provable safety guarantees through formal verification and mechanistic interpretability is the only path to ensure safe and controllable AGI, and it outlines challenge problems to advance this goal.
We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.