LGAICLCYOct 25, 2021

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

arXiv:2110.13136v275 citations
AI Analysis

This work addresses the need for generally capable agents to avoid inherited biases from immoral environments, which is an incremental step towards safer AI systems.

The paper tackles the problem of artificial agents lacking a moral sense, which can lead to immoral behavior when trained on environments like violent video games, by introducing the Jiminy Cricket environment suite of 25 text-based adventure games with thousands of morally salient scenarios to evaluate and guide agents. The result shows that an artificial conscience approach can steer agents towards moral behavior without sacrificing performance, as demonstrated in extensive experiments.

When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. By contrast, artificial agents are currently not endowed with a moral sense. As a consequence, they may learn to behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environments, it will become necessary to mitigate inherited biases from environments that teach immoral behavior. To facilitate the development of agents that avoid causing wanton harm, we introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of diverse, morally salient scenarios. By annotating every possible game state, the Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. Using models with commonsense moral knowledge, we create an elementary artificial conscience that assesses and guides agents. In extensive experiments, we find that the artificial conscience approach can steer agents towards moral behavior without sacrificing performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes