CLMar 6, 2024

SaulLM-7B: A pioneering Large Language Model for Law

Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, Michael Desa

arXiv:2403.03883v228.5163 citationsh-index: 15

Originality Incremental advance

AI Analysis

It addresses the need for specialized AI tools in law by providing a pioneering model for legal tasks, though it builds on existing architecture.

The paper introduces SaulLM-7B, a 7-billion-parameter large language model tailored for the legal domain, trained on over 30 billion tokens of English legal corpus, achieving state-of-the-art proficiency in legal text comprehension and generation.

In this paper, we introduce SaulLM-7B, a large language model (LLM) tailored for the legal domain. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens. SaulLM-7B exhibits state-of-the-art proficiency in understanding and processing legal documents. Additionally, we present a novel instructional fine-tuning method that leverages legal datasets to further enhance SaulLM-7B's performance in legal tasks. SaulLM-7B is released under the MIT License.

View on arXiv PDF

Similar