CLSep 30, 2024
Language Resources in Spanish for Automatic Text Simplification across DomainsAntonio Moreno-Sandoval, Leonardo Campillos-Llanos, Ana García-Serrano
This work describes the language resources and models developed for automatic simplification of Spanish texts in three domains: Finance, Medicine and History studies. We created several corpora in each domain, annotation and simplification guidelines, a lexicon of technical and simplified medical terms, datasets used in shared tasks for the financial domain, and two simplification tools. The methodology, resources and companion publications are shared publicly on the web-site: https://clara-nlp.uned.es/.
CVJul 7, 2025
Transcribing Spanish Texts from the Past: Experiments with Transkribus, Tesseract and GraniteYanco Amor Torterolo-Orta, Jaione Macicior-Mitxelena, Marina Miguez-Lamanuzzi et al.
This article presents the experiments and results obtained by the GRESEL team in the IberLEF 2025 shared task PastReader: Transcribing Texts from the Past. Three types of experiments were conducted with the dual aim of participating in the task and enabling comparisons across different approaches. These included the use of a web-based OCR service, a traditional OCR engine, and a compact multimodal model. All experiments were run on consumer-grade hardware, which, despite lacking high-performance computing capacity, provided sufficient storage and stability. The results, while satisfactory, leave room for further improvement. Future work will focus on exploring new techniques and ideas using the Spanish-language dataset provided by the shared task, in collaboration with Biblioteca Nacional de España (BNE).