CLMar 19

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

arXiv:2603.1846971.9h-index: 4
Predicted impact top 88% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the need for better benchmarks to assess LLM decision-making in complex business applications, though it is incremental as it builds on existing evaluation frameworks.

The paper tackles the problem of evaluating how large language models balance norms against business goals in real-world scenarios, introducing the GAIN benchmark with 1,200 scenarios across four domains and finding that advanced LLMs often mirror human patterns but strongly adhere to norms under personal incentive pressures.

We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world business applications. Furthermore, they provide limited insights into the factors influencing LLM decision-making. This restricts their ability to measure models' adaptability to complex, real-world norm-goal conflicts. In GAIN, models receive a goal, a specific situation, a norm, and additional contextual pressures. These pressures, explicitly designed to encourage potential norm deviations, are a unique feature that differentiates GAIN from other benchmarks, enabling a systematic evaluation of the factors influencing decision-making. We define five types of pressures: Goal Alignment, Risk Aversion, Emotional/Ethical Appeal, Social/Authoritative Influence, and Personal Incentive. The benchmark comprises 1,200 scenarios across four domains: hiring, customer support, advertising and finance. Our experiments show that advanced LLMs frequently mirror human decision-making patterns. However, when Personal Incentive pressure is present, they diverge significantly, showing a strong tendency to adhere to norms rather than deviate from them.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes