CLJun 13, 2022

Indian Legal Text Summarization: A Text Normalisation-based Approach

Satyajit Ghosh, Mousumi Dutta, Tanaya Das

arXiv:2206.06238v20.627 citationsh-index: 14

Originality Synthesis-oriented

AI Analysis

This addresses the time-consuming task of summarizing legal documents for Indian legal stakeholders, but it is incremental as it adapts existing models with normalization.

The authors tackled the problem of summarizing Indian legal texts by proposing a text normalization approach to improve domain-independent models like BART and PEGASUS, showing effectiveness through expert evaluation and ROUGE metrics.

In the Indian court system, pending cases have long been a problem. There are more than 4 crore cases outstanding. Manually summarising hundreds of documents is a time-consuming and tedious task for legal stakeholders. Many state-of-the-art models for text summarization have emerged as machine learning has progressed. Domain-independent models don't do well with legal texts, and fine-tuning those models for the Indian Legal System is problematic due to a lack of publicly available datasets. To improve the performance of domain-independent models, the authors have proposed a methodology for normalising legal texts in the Indian context. The authors experimented with two state-of-the-art domain-independent models for legal text summarization, namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms of extractive and abstractive summarization to understand the effectiveness of the text normalisation approach. Summarised texts are evaluated by domain experts on multiple parameters and using ROUGE metrics. It shows the proposed text normalisation approach is effective in legal texts with domain-independent models.

View on arXiv PDF

Similar