TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages
This work addresses the critical gap in safety evaluation of LLMs for low-resource African languages, highlighting vulnerabilities that could be exploited in culturally adapted contexts.
TukaBench introduces a culturally grounded jailbreak benchmark for seven African languages, finding that prompting in these languages reduces LLM refusal rates compared to English, with culturally adapted prompts being most effective. The study also reveals structural limitations including model comprehension failures and reduced reliability of LLM-as-a-judge in low-resource languages.
Safety evaluation of Large Language Models (LLMs) remains heavily English-centric, leaving Low-Resource Languages (LRLs), particularly African ones, critically underexplored. We introduce TUKABENCH, a jailbreak benchmark for seven African languages that extends JailbreakBench (JBB) beyond direct translation through four settings: human translation of JBB prompts, English adaptation to African contexts followed by human translation, human-curated prompts validated through interactions with GPT-5.2, and code-switched prompts combining English and African languages, isolating the effect of language, cultural grounding, and prompt evasiveness on model safety. Across closed and open models, prompting in African languages reduces refusal relative to English, with culturally adapted prompts leading to least refusal. The evaluation also surfaces two structural limitations: model comprehension failures and reduced LLM-as-a-judge reliability in LRLs. To capture the first, we introduce Deflection alongside Refused and Jailbroken; to assess the second, we validate outputs with human annotations, showing that judge-human agreement drops in lower-resource languages and less commonly supported scripts.