CLJan 27, 2025

MEL: Legal Spanish Language Model

arXiv:2501.16011v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of handling complex legal terminology in an underrepresented language for NLP practitioners, but it is incremental as it builds on existing pre-trained models.

The paper tackles the challenge of processing legal Spanish texts by developing MEL, a language model fine-tuned on legal documents, which shows significant improvement over baseline models in understanding legal Spanish language.

Legal texts, characterized by complex and specialized terminology, present a significant challenge for Language Models. Adding an underrepresented language, such as Spanish, to the mix makes it even more challenging. While pre-trained models like XLM-RoBERTa have shown capabilities in handling multilingual corpora, their performance on domain specific documents remains underexplored. This paper presents the development and evaluation of MEL, a legal language model based on XLM-RoBERTa-large, fine-tuned on legal documents such as BOE (Boletín Oficial del Estado, the Spanish oficial report of laws) and congress texts. We detail the data collection, processing, training, and evaluation processes. Evaluation benchmarks show a significant improvement over baseline models in understanding the legal Spanish language. We also present case studies demonstrating the model's application to new legal texts, highlighting its potential to perform top results over different NLP tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes