HC AIJan 12, 2022

The Concept of Criticality in AI Safety

arXiv:2201.04632v25.11 citations

Originality Incremental advance

AI Analysis

This addresses the value alignment problem for AI safety by reducing human workload, though it is incremental as it builds on existing monitoring approaches.

The paper tackles the inefficiency of human monitoring in AI safety by proposing that AI agents request permission only for critical actions, defined as potentially harmful, allowing operators to engage in other activities while maintaining safety.

When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.

View on arXiv PDF

Similar