Predicting Legal Proceedings Status: Approaches Based on Sequential Text Data
This assists public and private institutions in managing large legal portfolios for efficiency gains, though it is incremental as it applies existing NLP and ML techniques to a specific domain.
The paper tackled predicting the status of Brazilian legal proceedings using sequential text data, achieving a maximum accuracy of 0.93 and top average F1 scores of 0.89 (macro) and 0.93 (weighted).
The objective of this paper is to develop predictive models to classify Brazilian legal proceedings in three possible classes of status: (i) archived proceedings, (ii) active proceedings, and (iii) suspended proceedings. This problem's resolution is intended to assist public and private institutions in managing large portfolios of legal proceedings, providing gains in scale and efficiency. In this paper, legal proceedings are made up of sequences of short texts called "motions." We combined several natural language processing (NLP) and machine learning techniques to solve the problem. Although working with Portuguese NLP, which can be challenging due to lack of resources, our approaches performed remarkably well in the classification task, achieving maximum accuracy of .93 and top average F1 Scores of .89 (macro) and .93 (weighted). Furthermore, we could extract and interpret the patterns learned by one of our models besides quantifying how those patterns relate to the classification task. The interpretability step is important among machine learning legal applications and gives us an exciting insight into how black-box models make decisions.