CL AIApr 10

CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space

Yeonjun Hwang, Sungyong Park, Minju Kim, Dongha Lee, Jinyoung Yeo

arXiv:2604.0902968.1h-index: 13

AI Analysis

This addresses the need for better benchmarks to assess LLMs as decision-support tools in high-stakes domains, though it is incremental as it builds on existing benchmark frameworks.

The authors tackled the problem that existing decision-making benchmarks for large language models rely on unrealistic assumptions about actions and conditions, and introduced CONDESION-BENCH to evaluate conditional decision-making in compositional action spaces, providing a more rigorous assessment through oracle-based evaluation of decision quality and condition adherence.

Large language models have been widely explored as decision-support tools in high-stakes domains due to their contextual understanding and reasoning capabilities. However, existing decision-making benchmarks rely on two simplifying assumptions: actions are selected from a finite set of pre-defined candidates, and explicit conditions restricting action feasibility are not incorporated into the decision-making process. These assumptions fail to capture the compositional structure of real-world actions and the explicit conditions that constrain their validity. To address these limitations, we introduce CONDESION-BENCH, a benchmark designed to evaluate conditional decision-making in compositional action space. In CONDESION-BENCH, actions are defined as allocations to decision variables and are restricted by explicit conditions at the variable, contextual, and allocation levels. By employing oracle-based evaluation of both decision quality and condition adherence, we provide a more rigorous assessment of LLMs as decision-support tools.

View on arXiv PDF

Similar