Foundational Moral Values for AI Alignment
This work addresses the AI alignment problem for AI safety researchers by providing a philosophically robust structure, though it is incremental as it builds on existing moral philosophy without new technical implementations.
The paper tackles the AI alignment problem by proposing five foundational moral values derived from moral philosophy to provide clearer targets for aligning AI systems, showing that these values offer a framework for identifying threats and opportunities from AI.
Solving the AI alignment problem requires having clear, defensible values towards which AI systems can align. Currently, targets for alignment remain underspecified and do not seem to be built from a philosophically robust structure. We begin the discussion of this problem by presenting five core, foundational values, drawn from moral philosophy and built on the requisites for human existence: survival, sustainable intergenerational existence, society, education, and truth. We show that these values not only provide a clearer direction for technical alignment work, but also serve as a framework to highlight threats and opportunities from AI systems to both obtain and sustain these values.