MEL: Legal Spanish Language Model
This work addresses the problem of handling complex legal terminology in an underrepresented language for NLP practitioners, but it is incremental as it builds on existing pre-trained models.
The paper tackles the challenge of processing legal Spanish texts by developing MEL, a language model fine-tuned on legal documents, which shows significant improvement over baseline models in understanding legal Spanish language.
Legal texts, characterized by complex and specialized terminology, present a significant challenge for Language Models. Adding an underrepresented language, such as Spanish, to the mix makes it even more challenging. While pre-trained models like XLM-RoBERTa have shown capabilities in handling multilingual corpora, their performance on domain specific documents remains underexplored. This paper presents the development and evaluation of MEL, a legal language model based on XLM-RoBERTa-large, fine-tuned on legal documents such as BOE (Boletín Oficial del Estado, the Spanish oficial report of laws) and congress texts. We detail the data collection, processing, training, and evaluation processes. Evaluation benchmarks show a significant improvement over baseline models in understanding the legal Spanish language. We also present case studies demonstrating the model's application to new legal texts, highlighting its potential to perform top results over different NLP tasks.