CL AI IR LGOct 14, 2024

Rethinking Legal Judgement Prediction in a Realistic Scenario in the Era of Large Language Models

Shubham Kumar Nigam, Aniket Deroy, Subhankar Maity, Arnab Bhattacharya

arXiv:2410.10542v115.730 citationsh-index: 13Has CodeNLLP

Originality Synthesis-oriented

AI Analysis

This addresses the problem of realistic legal decision-making for legal professionals, though it is incremental as it applies existing methods to a new scenario.

This study tackled legal judgment prediction in a realistic scenario using Indian judgments with transformer models and LLMs like GPT-3.5 Turbo, finding that GPT-3.5 Turbo performed best and that adding legal information improved predictions, but LLMs still fell short of expert-level performance.

This study investigates judgment prediction in a realistic scenario within the context of Indian judgments, utilizing a range of transformer-based models, including InLegalBERT, BERT, and XLNet, alongside LLMs such as Llama-2 and GPT-3.5 Turbo. In this realistic scenario, we simulate how judgments are predicted at the point when a case is presented for a decision in court, using only the information available at that time, such as the facts of the case, statutes, precedents, and arguments. This approach mimics real-world conditions, where decisions must be made without the benefit of hindsight, unlike retrospective analyses often found in previous studies. For transformer models, we experiment with hierarchical transformers and the summarization of judgment facts to optimize input for these models. Our experiments with LLMs reveal that GPT-3.5 Turbo excels in realistic scenarios, demonstrating robust performance in judgment prediction. Furthermore, incorporating additional legal information, such as statutes and precedents, significantly improves the outcome of the prediction task. The LLMs also provide explanations for their predictions. To evaluate the quality of these predictions and explanations, we introduce two human evaluation metrics: Clarity and Linking. Our findings from both automatic and human evaluations indicate that, despite advancements in LLMs, they are yet to achieve expert-level performance in judgment prediction and explanation tasks.

View on arXiv PDF Code

Similar