CLMay 26

KZ-SafetyPrompts: A Kazakh Safety Evaluation Prompt Dataset for Large Language Models

Wajdi Zaghouani, Shimaa Amer Ibrahim, Aruzhan Muratbek, Olzhasbek Zhakenov, Adiya Akhmetzhanova

arXiv:2605.2694768.6

Predicted impact top 57% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

For developers and researchers of LLM safety, this dataset addresses the underrepresentation of Kazakh in safety evaluation, exposing category-specific vulnerabilities not detected by English benchmarks.

The authors present KZ-SafetyPrompts, a Kazakh-language dataset of 5,717 safety evaluation prompts across 11 risk categories, and show that GPT-4o's refusal rate is only 28.2% overall, varying from 5.5% to 53.8% by category, revealing safety gaps missed by English-only evaluation.

Kazakh is underrepresented in resources for evaluating the safety behavior of large language models. We present KZ-SafetyPrompts, a Kazakh prompt dataset for safety evaluation across eleven categories covering common risk areas such as self-harm, violence, child exploitation, sexual content, racist content, radicalization, and regulated goods or illegal activities. The dataset contains 5,717 prompts written natively in Kazakh (Cyrillic), organized by category, with English translations for cross-lingual analysis. Prompts resemble realistic user queries, often in a teen or child style, and are phrased as intent prompts without procedural instructions. We document the writing protocol, labeling procedures (including borderline-case decision rules), and quality-control steps (schema standardization, completeness checks, and deduplication). We also align the categories with widely used safety taxonomies to support integration with existing evaluation pipelines. Baseline results with GPT-4o show an overall refusal rate of 28.2%, varying from 5.5% to 53.8% across categories, indicating that Kazakh prompts expose category-specific safety gaps not captured by English-only evaluation.

View on arXiv PDF

Similar