CLJun 23, 2025

MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Multi-hop Hate Speech Explanation

Jackson Trager, Francielle Vargas, Diego Alves, Matteo Guida, Mikel K. Ngueajio, Ameeta Agrawal, Yalda Daryani, Farzan Karimi-Malekabadi, Flor Miriam Plaza-del-Arco

arXiv:2506.19073v45 citationsh-index: 13EMNLP

Originality Incremental advance

AI Analysis

This addresses the need for better evaluation benchmarks for LLMs in socially sensitive tasks, particularly for multilingual and transparent moral reasoning, though it is incremental as it builds on existing datasets and theories.

The paper tackles the problem of evaluating moral reasoning in LLMs by introducing MFTCXplain, a multilingual dataset with 3,000 tweets annotated for hate speech, moral categories, and rationales, and finds that LLMs perform well in hate speech detection (F1 up to 0.836) but poorly in predicting moral sentiments (F1 < 0.35) and rationale alignment.

Ensuring the moral reasoning capabilities of Large Language Models (LLMs) is a growing concern as these systems are used in socially sensitive tasks. Nevertheless, current evaluation benchmarks present two major shortcomings: a lack of annotations that justify moral classifications, which limits transparency and interpretability; and a predominant focus on English, which constrains the assessment of moral reasoning across diverse cultural settings. In this paper, we introduce MFTCXplain, a multilingual benchmark dataset for evaluating the moral reasoning of LLMs via multi-hop hate speech explanation using the Moral Foundations Theory. MFTCXplain comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales. Our results show a misalignment between LLM outputs and human annotations in moral reasoning tasks. While LLMs perform well in hate speech detection (F1 up to 0.836), their ability to predict moral sentiments is notably weak (F1 < 0.35). Furthermore, rationale alignment remains limited mainly in underrepresented languages. Our findings show the limited capacity of current LLMs to internalize and reflect human moral reasoning

View on arXiv PDF

Similar