LGAINCJan 9, 2024

Concept Alignment

arXiv:2401.08672v119 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

This work addresses a foundational problem in AI safety for researchers and developers, proposing a shift in focus from value to concept alignment, though it is incremental as it builds on existing interdisciplinary ideas without introducing new methods or data.

The paper argues that AI alignment must first address concept alignment, ensuring AI systems and humans share the same understanding of world concepts, before tackling value alignment. It integrates insights from philosophy, cognitive science, and deep learning to outline challenges and propose leveraging existing tools to advance this goal.

Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world. We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment, not just value alignment, between humans and machines. We summarize existing accounts of how humans and machines currently learn concepts, and we outline opportunities and challenges in the path towards shared concepts. Finally, we explain how we can leverage the tools already being developed in cognitive science and AI research to accelerate progress towards concept alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes