AIFeb 27, 2023

Safety without alignment

arXiv:2303.00752v21 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses AI safety for AGI development by offering a novel, non-alignment-based framework, though it appears incremental as it builds on existing ethical rationalism concepts.

The paper tackles the problem of AI safety by proposing an alternative to human value alignment, based on ethical rationalism and implemented via hybrid theorem provers in a sandbox, arguing this approach leverages AGIs' increasing rationality for long-term safety.

Currently, the dominant paradigm in AI safety is alignment with human values. Here we describe progress on developing an alternative approach to safety, based on ethical rationalism (Gewirth:1978), and propose an inherently safe implementation path via hybrid theorem provers in a sandbox. As AGIs evolve, their alignment may fade, but their rationality can only increase (otherwise more rational ones will have a significant evolutionary advantage) so an approach that ties their ethics to their rationality has clear long-term advantages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes