Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation
This work addresses the challenge of summarizing lengthy legal documents for legal professionals, but it is incremental as it applies existing summarization methods to a new domain.
The paper tackled the problem of summarizing long legal case documents by comparing extractive and abstractive methods, finding that extractive approaches often outperform abstractive ones due to token length constraints, with evaluations involving law practitioners providing key insights.
Summarization of legal case judgement documents is a challenging problem in Legal NLP. However, not much analyses exist on how different families of summarization models (e.g., extractive vs. abstractive) perform when applied to legal case documents. This question is particularly important since many recent transformer-based abstractive summarization models have restrictions on the number of input tokens, and legal documents are known to be very long. Also, it is an open question on how best to evaluate legal case document summarization systems. In this paper, we carry out extensive experiments with several extractive and abstractive summarization methods (both supervised and unsupervised) over three legal summarization datasets that we have developed. Our analyses, that includes evaluation by law practitioners, lead to several interesting insights on legal summarization in specific and long document summarization in general.