IR CLJul 19, 2021

Unsupervised Identification of Relevant Prior Cases

arXiv:2107.08973v18 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for efficient document retrieval in the legal domain, but it is incremental as it applies existing methods to this specific task.

The paper tackled the problem of identifying relevant legal precedents for a given query case using unsupervised methods, finding that combining TF-IDF and BM25 scores yielded the best results, with specific metrics like precision@10, recall@10, and MRR used for evaluation.

Document retrieval has taken its role in almost all domains of knowledge understanding, including the legal domain. Precedent refers to a court decision that is considered as authority for deciding subsequent cases involving identical or similar facts or similar legal issues. In this work, we propose different unsupervised approaches to solve the task of identifying relevant precedents to a given query case. Our proposed approaches are using word embeddings like word2vec, doc2vec, and sent2vec, finding cosine similarity using TF-IDF, retrieving relevant documents using BM25 scores, using the pre-trained model and SBERT to find the most similar document, and using the product of BM25 and TF-IDF scores to find the most relevant document for a given query. We compared all the methods based on precision@10, recall@10, and MRR. Based on the comparative analysis, we found that the TF-IDF score multiplied by the BM25 score gives the best result. In this paper, we have also presented the analysis that we did to improve the BM25 score.

View on arXiv PDF

Similar