Moral Reasoning Across Languages: The Critical Role of Low-Resource Languages in LLMs
This addresses the challenge of ensuring equitable AI performance across languages, particularly for low-resource ones, though it is incremental as it builds on existing multilingual benchmarks and fine-tuning methods.
The paper tackles the problem of evaluating moral reasoning in large language models across languages, finding that performance degrades with context complexity, especially for low-resource languages like Vietnamese, and that fine-tuning reveals low-resource languages have a stronger impact on multilingual reasoning than high-resource ones.
In this paper, we introduce the Multilingual Moral Reasoning Benchmark (MMRB) to evaluate the moral reasoning abilities of large language models (LLMs) across five typologically diverse languages and three levels of contextual complexity: sentence, paragraph, and document. Our results show moral reasoning performance degrades with increasing context complexity, particularly for low-resource languages such as Vietnamese. We further fine-tune the open-source LLaMA-3-8B model using curated monolingual data for alignment and poisoning. Surprisingly, low-resource languages have a stronger impact on multilingual reasoning than high-resource ones, highlighting their critical role in multilingual NLP.