AIMar 24

Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions

arXiv:2603.2297336.6h-index: 1
AI Analysis

This addresses the challenge of analyzing law at scale for legal scholars, but it is incremental as it focuses on a specific domain and benchmark.

The study tackled the problem of detecting implicit legal citations in French court decisions by distinguishing legal reasoning from semantic similarity, achieving 77% accuracy with a supervised ensemble and 76% precision in an unsupervised top-k ranking setting.

Computational methods applied to legal scholarship hold the promise of analyzing law at scale. We start from a simple question: how often do courts implicitly apply statutory rules? This requires distinguishing legal reasoning from semantic similarity. We focus on implicit citation of the French Civil Code in first-instance court decisions and introduce a benchmark of 1,015 passage-article pairs annotated by three legal experts. We show that expert disagreement predicts model failures. Inter-annotator agreement is moderate ($κ$ = 0.33) with 43% of disagreements involving the boundary between factual description and legal reasoning. Our supervised ensemble achieves F1 = 0.70 (77% accuracy), but this figure conceals an asymmetry: 68% of false positives fall on the 33% of cases where the annotators disagreed. Despite these limits, reframing the task as top-k ranking and leveraging multi-model consensus yields 76% precision at k = 200 in an unsupervised setting. Moreover, the remaining false positives tend to surface legally ambiguous applications rather than obvious errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes