Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions
This addresses the challenge of analyzing law at scale for legal scholars, but it is incremental as it focuses on a specific domain and benchmark.
The study tackled the problem of detecting implicit legal citations in French court decisions by distinguishing legal reasoning from semantic similarity, achieving 77% accuracy with a supervised ensemble and 76% precision in an unsupervised top-k ranking setting.
Computational methods applied to legal scholarship hold the promise of analyzing law at scale. We start from a simple question: how often do courts implicitly apply statutory rules? This requires distinguishing legal reasoning from semantic similarity. We focus on implicit citation of the French Civil Code in first-instance court decisions and introduce a benchmark of 1,015 passage-article pairs annotated by three legal experts. We show that expert disagreement predicts model failures. Inter-annotator agreement is moderate ($κ$ = 0.33) with 43% of disagreements involving the boundary between factual description and legal reasoning. Our supervised ensemble achieves F1 = 0.70 (77% accuracy), but this figure conceals an asymmetry: 68% of false positives fall on the 33% of cases where the annotators disagreed. Despite these limits, reframing the task as top-k ranking and leveraging multi-model consensus yields 76% precision at k = 200 in an unsupervised setting. Moreover, the remaining false positives tend to surface legally ambiguous applications rather than obvious errors.