Mind the Language Gap in Digital Humanities: LLM-Aided Translation of SKOS Thesauri
This addresses the need for more inclusive and multilingual research infrastructures in Digital Humanities, though it is incremental as it builds on existing translation and LLM methods.
The paper tackles the problem of language diversity limiting access and interoperability of SKOS thesauri in Digital Humanities by introducing WOKIE, a pipeline for automated translation that combines external services with LLM refinement, showing it enhances accessibility and improves ontology matching performance across 15 languages.
We introduce WOKIE, an open-source, modular, and ready-to-use pipeline for the automated translation of SKOS thesauri. This work addresses a critical need in the Digital Humanities (DH), where language diversity can limit access, reuse, and semantic interoperability of knowledge resources. WOKIE combines external translation services with targeted refinement using Large Language Models (LLMs), balancing translation quality, scalability, and cost. Designed to run on everyday hardware and be easily extended, the application requires no prior expertise in machine translation or LLMs. We evaluate WOKIE across several DH thesauri in 15 languages with different parameters, translation services and LLMs, systematically analysing translation quality, performance, and ontology matching improvements. Our results show that WOKIE is suitable to enhance the accessibility, reuse, and cross-lingual interoperability of thesauri by hurdle-free automated translation and improved ontology matching performance, supporting more inclusive and multilingual research infrastructures.