CYCLMay 25, 2022

Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy

arXiv:2205.12771v1643 citationsh-index: 41
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of aligning AI with human moral values, highlighting potential biases in current training approaches, but it is incremental as it builds on existing probing methods without introducing new solutions.

The study investigated whether the Delphi model learns consistent ethical principles from human-annotated moral scenarios, finding that it tends to reflect the moral views of the annotator demographics, though with some inconsistencies.

In an effort to guarantee that machine learning model outputs conform with human moral values, recent work has begun exploring the possibility of explicitly training models to learn the difference between right and wrong. This is typically done in a bottom-up fashion, by exposing the model to different scenarios, annotated with human moral judgements. One question, however, is whether the trained models actually learn any consistent, higher-level ethical principles from these datasets -- and if so, what? Here, we probe the Allen AI Delphi model with a set of standardized morality questionnaires, and find that, despite some inconsistencies, Delphi tends to mirror the moral principles associated with the demographic groups involved in the annotation process. We question whether this is desirable and discuss how we might move forward with this knowledge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes