CLAIIRLGJun 26, 2025

skLEP: A Slovak General Language Understanding Benchmark

arXiv:2506.21508v13 citationsh-index: 4Has CodeACL
Originality Synthesis-oriented
AI Analysis

This addresses the lack of standardized evaluation resources for Slovak NLU, which is incremental as it adapts existing benchmark approaches to a new language.

The authors introduced skLEP, the first comprehensive benchmark for evaluating Slovak natural language understanding models, covering nine diverse tasks. They conducted the first systematic evaluation of various language models on this benchmark and released all resources publicly to foster future research.

In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes