AICYJul 2, 2023

Minimum Levels of Interpretability for Artificial Moral Agents

arXiv:2307.00660v17 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses the need for interpretability in AI systems making moral decisions, but it appears incremental as it builds on existing sub-field overviews without claiming major breakthroughs.

The paper tackles the problem of ensuring trust and understanding in artificial moral agents (AMAs) by introducing the concept of Minimum Level of Interpretability (MLI) and recommending MLIs for different agent types to aid safe deployment.

As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes