LGMar 11, 2022

verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT

arXiv:2203.06224v14.67 citationsh-index: 19Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for efficient document categorization in the Brazilian legal domain, but it is incremental as it applies an existing method (BERT) to a new dataset.

The paper tackled the problem of automating multi-label categorization of Brazilian case law documents using BERT, achieving an F1-micro score of 0.72, which represents a 30 percentage point improvement over a statistical baseline.

In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the Kollemata Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of F1-micro=0.72 corresponding to gains of 30 percent points over the tested statistical baseline. In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the \textit{Kollemata} Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of $\langle \mathcal{F}_1 \rangle_{micro}=0.72$ corresponding to gains of 30 percent points over the tested statistical baseline.

View on arXiv PDF Code

Similar