CLSep 15, 2021

The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification

arXiv:2109.07234v21.216 citations

Originality Synthesis-oriented

AI Analysis

This work highlights an incremental finding for legal NLP practitioners, suggesting that simpler methods may suffice, potentially reducing computational costs.

The paper shows that Support Vector Machines achieve competitive performance with BERT-based models on legal text classification tasks in the LexGLUE benchmark, with error reduction from specialized models being smaller in the legal domain compared to general language tasks.

We aim to highlight an interesting trend to contribute to the ongoing debate around advances within legal Natural Language Processing. Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT. In this paper, we show that a more traditional approach based on Support Vector Machine classifiers reaches surprisingly competitive performance with BERT-based models on the classification tasks in the LexGLUE benchmark. We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks. We present and discuss three hypotheses as potential explanations for these results to support future discussions.

View on arXiv PDF

Similar