AICYSYMay 16, 2024

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

arXiv:2405.09794v29 citationsh-index: 32
Originality Highly original
AI Analysis

This work addresses safety concerns for advanced AI technologies interacting with humans, presenting a novel interdisciplinary approach that is foundational rather than incremental.

The paper tackles the problem of ensuring safety in human-AI interactions by arguing that current methods focusing on static model outputs are insufficient, and it proposes a unifying formalism and technical roadmap to address dynamic feedback loops between AI outputs and human behavior.

Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes