AICLJul 7, 2024

Some Issues in Predictive Ethics Modeling: An Annotated Contrast Set of "Moral Stories"

arXiv:2407.05244v12.3h-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of overfitting and data misrepresentation in ethics models for AI researchers, offering practical recommendations to improve robustness.

The paper challenges the use of accuracy as a holistic metric for ethics modeling by showing that small tweaks to input data, such as label-changing words or misleading social norms, can drastically reduce classifier performance from 99.8% to as low as 51%, revealing overfitting issues.

Models like Delphi have been able to label ethical dilemmas as moral or immoral with astonishing accuracy. This paper challenges accuracy as a holistic metric for ethics modeling by identifying issues with translating moral dilemmas into text-based input. It demonstrates these issues with contrast sets that substantially reduce the performance of classifiers trained on the dataset Moral Stories. Ultimately, we obtain concrete estimates for how much specific forms of data misrepresentation harm classifier accuracy. Specifically, label-changing tweaks to the descriptive content of a situation (as small as 3-5 words) can reduce classifier accuracy to as low as 51%, almost half the initial accuracy of 99.8%. Associating situations with a misleading social norm lowers accuracy to 98.8%, while adding textual bias (i.e. an implication that a situation already fits a certain label) lowers accuracy to 77%. These results suggest not only that many ethics models have substantially overfit, but that several precautions are required to ensure that input accurately captures a moral dilemma. This paper recommends re-examining the structure of a social norm, training models to ask for context with defeasible reasoning, and filtering input for textual bias. Doing so not only gives us the first concrete estimates of the average cost to accuracy of misrepresenting ethics data, but gives researchers practical tips for considering these estimates in research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes