Legal Area Classification: A Comparative Study of Text Classifiers on Singapore Supreme Court Judgments
This addresses legal professionals and researchers needing efficient text classification in law, but it is incremental as it applies existing methods to a new dataset.
The paper compared machine learning approaches for classifying Singapore Supreme Court judgments into legal areas, finding that all tested methods performed well with only a few hundred documents, though optimization for the legal domain is still needed.
This paper conducts a comparative study on the performance of various machine learning (``ML'') approaches for classifying judgments into legal areas. Using a novel dataset of 6,227 Singapore Supreme Court judgments, we investigate how state-of-the-art NLP methods compare against traditional statistical models when applied to a legal corpus that comprised few but lengthy documents. All approaches tested, including topic model, word embedding, and language model-based classifiers, performed well with as little as a few hundred judgments. However, more work needs to be done to optimize state-of-the-art methods for the legal domain.