A Survey of Spanish Clinical Language Models
This work addresses the need for accessible and reproducible benchmarks in Spanish clinical NLP, though it is incremental as it surveys and compares existing models rather than introducing new methods.
The authors conducted a survey of encoder language models for clinical tasks in Spanish, benchmarking over 3000 fine-tuned models on curated corpora to identify the best-performing ones, with all data and models made publicly available for reproducibility.
This survey focuses in encoder Language Models for solving tasks in the clinical domain in the Spanish language. We review the contributions of 17 corpora focused mainly in clinical tasks, then list the most relevant Spanish Language Models and Spanish Clinical Language models. We perform a thorough comparison of these models by benchmarking them over a curated subset of the available corpora, in order to find the best-performing ones; in total more than 3000 models were fine-tuned for this study. All the tested corpora and the best models are made publically available in an accessible way, so that the results can be reproduced by independent teams or challenged in the future when new Spanish Clinical Language models are created.