CL LGMay 3, 2023

CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal Assistive Writing

Mann Khatri, Pritish Wadhwa, Gitansh Satija, Reshma Sheik, Yaman Kumar, Rajiv Ratn Shah, Ponnurangam Kumaraguru

arXiv:2305.03508v10.5

Originality Incremental advance

AI Analysis

This addresses the challenge of automating citation detection for legal professionals, but it is incremental as it builds on existing citation recommendation systems.

The paper tackled the problem of identifying citation-worthy sentences in legal documents to assist writing, achieving an 88% F1-score using a domain-specific pre-trained model on a new dataset of 178M sentences.

In legal document writing, one of the key elements is properly citing the case laws and other sources to substantiate claims and arguments. Understanding the legal domain and identifying appropriate citation context or cite-worthy sentences are challenging tasks that demand expensive manual annotation. The presence of jargon, language semantics, and high domain specificity makes legal language complex, making any associated legal task hard for automation. The current work focuses on the problem of citation-worthiness identification. It is designed as the initial step in today's citation recommendation systems to lighten the burden of extracting an adequate set of citation contexts. To accomplish this, we introduce a labeled dataset of 178M sentences for citation-worthiness detection in the legal domain from the Caselaw Access Project (CAP). The performance of various deep learning models was examined on this novel dataset. The domain-specific pre-trained model tends to outperform other models, with an 88% F1-score for the citation-worthiness detection task.

View on arXiv PDF

Similar