CL AIMar 26, 2024

Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language Models

Hai-Long Nguyen, Duc-Minh Nguyen, Tan-Minh Nguyen, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong, Ken Satoh

arXiv:2403.18093v13.46 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This work addresses retrieval challenges in the legal domain, but it is incremental as it builds on existing techniques like BM25 and BERT.

The paper tackled legal document retrieval by proposing a multi-phase system combining BM25 pre-ranking, BERT-based re-ranking, and prompting with large language models, achieving significant accuracy improvements on the COLIEE 2023 dataset.

Large language models with billions of parameters, such as GPT-3.5, GPT-4, and LLaMA, are increasingly prevalent. Numerous studies have explored effective prompting techniques to harness the power of these LLMs for various research problems. Retrieval, specifically in the legal data domain, poses a challenging task for the direct application of Prompting techniques due to the large number and substantial length of legal articles. This research focuses on maximizing the potential of prompting by placing it as the final phase of the retrieval system, preceded by the support of two phases: BM25 Pre-ranking and BERT-based Re-ranking. Experiments on the COLIEE 2023 dataset demonstrate that integrating prompting techniques on LLMs into the retrieval system significantly improves retrieval accuracy. However, error analysis reveals several existing issues in the retrieval system that still need resolution.

View on arXiv PDF

Similar