CL AIAug 24, 2025

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Siying Zhou, Yiquan Wu, Hui Chen, Xavier Hu, Kun Kuang, Adam Jatowt, Ming Hu, Chunyan Zheng, Fei Wu

arXiv:2508.17234v21 citationsh-index: 10EMNLP

Originality Synthesis-oriented

AI Analysis

It addresses a novel task in legal AI to assist non-professionals, though it is incremental as it focuses on dataset creation and evaluation without proposing new methods.

The paper tackles the problem of generating legal claims from case facts for non-professionals, constructing the first Chinese dataset (ClaimGen-CN) and evaluating state-of-the-art models, which show limitations in factual precision and clarity.

Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.

View on arXiv PDF

Similar