Spanish Legalese Language Model and Corpora
This addresses the need for domain-specific language models in Spanish legal contexts, but it is incremental as it applies existing methods to new data.
The authors tackled the lack of specialized Spanish language models by creating a legal-domain model and corpora, achieving reasonable results on general Spanish tasks.
There are many Language Models for the English language according to its worldwide relevance. However, for the Spanish language, even if it is a widely spoken language, there are very few Spanish Language Models which result to be small and too general. Legal slang could be think of a Spanish variant on its own as it is very complicated in vocabulary, semantics and phrase understanding. For this work we gathered legal-domain corpora from different sources, generated a model and evaluated against Spanish general domain tasks. The model provides reasonable results in those tasks.