CALRK-Bench: Evaluating Context-Aware Legal Reasoning in Korean Law
This work addresses the need for better evaluation of AI in legal reasoning for Korean law practitioners and researchers, though it is incremental as it introduces a new benchmark rather than a novel method.
The authors tackled the problem of evaluating context-aware legal reasoning in Korean law by proposing CALRK-Bench, a benchmark that assesses models on tasks like identifying temporal validity of norms and understanding judgment shifts, and found that recent large language models consistently perform poorly on these tasks.
Legal reasoning requires not only the application of legal rules but also an understanding of the context in which those rules operate. However, existing legal benchmarks primarily evaluate rule application under the assumption of fixed norms, and thus fail to capture situations where legal judgments shift or where multiple norms interact. In this work, we propose CALRK-Bench, a context-aware legal reasoning benchmark based on the legal system in Korean. CALRK-Bench evaluates whether models can identify the temporal validity of legal norms, determine whether sufficient legal information is available for a given case, and understand the reasons behind shifts in legal judgments. The dataset is constructed from legal precedents and legal consultation records, and is validated by legal experts. Experimental results show that even recent large language models consistently exhibit low performance on these three tasks. CALRK-Bench provides a new stress test for evaluating context-aware legal reasoning rather than simple memorization of legal knowledge. Our code is available at https://github.com/jhCOR/CALRKBench.