CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space
This addresses the need for better benchmarks to assess LLMs as decision-support tools in high-stakes domains, though it is incremental as it builds on existing benchmark frameworks.
The authors tackled the problem that existing decision-making benchmarks for large language models rely on unrealistic assumptions about actions and conditions, and introduced CONDESION-BENCH to evaluate conditional decision-making in compositional action spaces, providing a more rigorous assessment through oracle-based evaluation of decision quality and condition adherence.
Large language models have been widely explored as decision-support tools in high-stakes domains due to their contextual understanding and reasoning capabilities. However, existing decision-making benchmarks rely on two simplifying assumptions: actions are selected from a finite set of pre-defined candidates, and explicit conditions restricting action feasibility are not incorporated into the decision-making process. These assumptions fail to capture the compositional structure of real-world actions and the explicit conditions that constrain their validity. To address these limitations, we introduce CONDESION-BENCH, a benchmark designed to evaluate conditional decision-making in compositional action space. In CONDESION-BENCH, actions are defined as allocations to decision variables and are restricted by explicit conditions at the variable, contextual, and allocation levels. By employing oracle-based evaluation of both decision quality and condition adherence, we provide a more rigorous assessment of LLMs as decision-support tools.