CL AI IR LGOct 31, 2025

IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval

Shounak Paul, Dhananjay Ghumare, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi

arXiv:2511.00268v14.91 citationsh-index: 9EMNLP

Originality Synthesis-oriented

AI Analysis

This work addresses a gap for law practitioners by providing a common testbed for both statute and precedent retrieval, though it is incremental as it builds on existing retrieval methods.

The authors tackled the problem of retrieving relevant statutes and prior cases in legal practice by creating a unified corpus (IL-PCR) that allows models to exploit the dependence between these tasks, and they developed an LLM-based re-ranking approach that achieved the best performance.

Identifying/retrieving relevant statutes and prior cases/precedents for a given legal situation are common tasks exercised by law practitioners. Researchers to date have addressed the two tasks independently, thus developing completely different datasets and models for each task; however, both retrieval tasks are inherently related, e.g., similar cases tend to cite similar statutes (due to similar factual situation). In this paper, we address this gap. We propose IL-PCR (Indian Legal corpus for Prior Case and Statute Retrieval), which is a unique corpus that provides a common testbed for developing models for both the tasks (Statute Retrieval and Precedent Retrieval) that can exploit the dependence between the two. We experiment extensively with several baseline models on the tasks, including lexical models, semantic models and ensemble based on GNNs. Further, to exploit the dependence between the two tasks, we develop an LLM-based re-ranking approach that gives the best performance.

View on arXiv PDF

Similar