LGMar 11, 2022

verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT

arXiv:2203.06224v17 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient document categorization in the Brazilian legal domain, but it is incremental as it applies an existing method (BERT) to a new dataset.

The paper tackled the problem of automating multi-label categorization of Brazilian case law documents using BERT, achieving an F1-micro score of 0.72, which represents a 30 percentage point improvement over a statistical baseline.

In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the Kollemata Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of F1-micro=0.72 corresponding to gains of 30 percent points over the tested statistical baseline. In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the \textit{Kollemata} Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of $\langle \mathcal{F}_1 \rangle_{micro}=0.72$ corresponding to gains of 30 percent points over the tested statistical baseline.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes