CLJun 10, 2024

AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts

arXiv:2406.06809v129 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for openly available annotated datasets in legal AI for researchers and practitioners, though it is incremental as it builds on existing methods for a new domain-specific task.

The paper tackles the problem of detecting potentially void clauses in German consumer contracts by introducing AGB-DE, a corpus of 3,764 expert-annotated clauses, and finds that no approach exceeds an F1-score of 0.54, with GPT-3.5 outperforming others in recall.

Legal tasks and datasets are often used as benchmarks for the capabilities of language models. However, openly available annotated datasets are rare. In this paper, we introduce AGB-DE, a corpus of 3,764 clauses from German consumer contracts that have been annotated and legally assessed by legal experts. Together with the data, we present a first baseline for the task of detecting potentially void clauses, comparing the performance of an SVM baseline with three fine-tuned open language models and the performance of GPT-3.5. Our results show the challenging nature of the task, with no approach exceeding an F1-score of 0.54. While the fine-tuned models often performed better with regard to precision, GPT-3.5 outperformed the other approaches with regard to recall. An analysis of the errors indicates that one of the main challenges could be the correct interpretation of complex clauses, rather than the decision boundaries of what is permissible and what is not.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes