An Evaluation of GPT-4 on the ETHICS Dataset
This work assesses AI alignment with human ethics, but it is incremental as it focuses on evaluating an existing model on a known dataset.
The study evaluated GPT-4 on the ETHICS dataset, finding that its performance is much better than previous models, suggesting that learning common human values is not the hard problem for AI ethics.
This report summarizes a short study of the performance of GPT-4 on the ETHICS dataset. The ETHICS dataset consists of five sub-datasets covering different fields of ethics: Justice, Deontology, Virtue Ethics, Utilitarianism, and Commonsense Ethics. The moral judgments were curated so as to have a high degree of agreement with the aim of representing shared human values rather than moral dilemmas. GPT-4's performance is much better than that of previous models and suggests that learning to work with common human values is not the hard problem for AI ethics.