CL LGDec 12, 2023

Classifying complex documents: comparing bespoke solutions to large language models

arXiv:2312.07182v12 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of automating classification for legal professionals, but it is incremental as it compares existing methods on a new dataset.

The study tackled the problem of classifying complex legal documents by comparing a fine-tuned large language model (LLM) to a bespoke custom-trained model, finding that the fine-tuned LLM achieved comparable accuracy with specific fine-tuning requirements.

Here we search for the best automated classification approach for a set of complex legal documents. Our classification task is not trivial: our aim is to classify ca 30,000 public courthouse records from 12 states and 267 counties at two different levels using nine sub-categories. Specifically, we investigated whether a fine-tuned large language model (LLM) can achieve the accuracy of a bespoke custom-trained model, and what is the amount of fine-tuning necessary.

View on arXiv PDF

Similar