CLNov 15, 2023

LePaRD: A Large-Scale Dataset of Judges Citing Precedents

arXiv:2311.09356v33 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for legal NLP researchers and practitioners by providing a dataset to facilitate legal retrieval and reasoning tasks, with potential to expand access to justice, but it is incremental as it focuses on dataset creation and benchmarking.

The authors tackled the problem of legal passage prediction by creating LePaRD, a large-scale dataset of U.S. federal judicial citations to precedent, and found that classification methods performed best in evaluations, though the task remains difficult with significant room for improvement.

We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a legal argument. We extensively evaluate various retrieval approaches on LePaRD, and find that classification appears to work best. However, we note that legal precedent prediction is a difficult task, and there remains significant room for improvement. We hope that by publishing LePaRD, we will encourage others to engage with a legal NLP task that promises to help expand access to justice by reducing the burden associated with legal research. A subset of the LePaRD dataset is freely available and the whole dataset will be released upon publication.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes