CLDec 19, 2024

LLMs as mediators: Can they diagnose conflicts accurately?

arXiv:2412.14675v13 citationsh-index: 41ACM Journal on Computing and Sustainable Societies
Originality Incremental advance
AI Analysis

This addresses the problem of using LLMs for conflict mediation by diagnosing disagreement sources, but it is incremental as it builds on prior human studies.

The study tested whether GPT 3.5 and GPT 4 can diagnose conflicts by distinguishing between causal and moral disagreements, finding that both LLMs understand the distinction similarly to humans but tend to overestimate causal and underestimate moral disagreements, especially GPT 4 with a proximate scale.

Prior research indicates that to be able to mediate conflict, observers of disagreements between parties must be able to reliably distinguish the sources of their disagreement as stemming from differences in beliefs about what is true (causality) vs. differences in what they value (morality). In this paper, we test if OpenAI's Large Language Models GPT 3.5 and GPT 4 can perform this task and whether one or other type of disagreement proves particularly challenging for LLM's to diagnose. We replicate study 1 in Koçak et al. (2003), which employes a vignette design, with OpenAI's GPT 3.5 and GPT 4. We find that both LLMs have similar semantic understanding of the distinction between causal and moral codes as humans and can reliably distinguish between them. When asked to diagnose the source of disagreement in a conversation, both LLMs, compared to humans, exhibit a tendency to overestimate the extent of causal disagreement and underestimate the extent of moral disagreement in the moral misalignment condition. This tendency is especially pronounced for GPT 4 when using a proximate scale that relies on concrete language specific to an issue. GPT 3.5 does not perform as well as GPT4 or humans when using either the proximate or the distal scale. The study provides a first test of the potential for using LLMs to mediate conflict by diagnosing the root of disagreements in causal and evaluative codes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes