CYAILGSep 5, 2023

Provably safe systems: the only path to controllable AGI

arXiv:2309.01933v140 citationsh-index: 86
Originality Synthesis-oriented
AI Analysis

This addresses the critical problem of ensuring safe AGI for humanity, but it is incremental as it builds on existing ideas in AI safety without presenting new empirical results.

The paper argues that building Artificial General Intelligences (AGIs) with provable safety guarantees through formal verification and mechanistic interpretability is the only path to ensure safe and controllable AGI, and it outlines challenge problems to advance this goal.

We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes