CLMay 23, 2025

EXECUTE: A Multilingual Benchmark for LLM Token Understanding

arXiv:2505.17784v11 citationsh-index: 2ACL
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating LLM token understanding across diverse languages for researchers, but it is incremental as it builds on an existing benchmark.

The authors extended the CUTE benchmark to multiple languages with diverse scripts, introducing EXECUTE to assess LLM token understanding, revealing that challenges vary by language, with some showing word-level issues or no issues, and they tested sub-character tasks in Chinese, Japanese, and Korean.

The CUTE benchmark showed that LLMs struggle with character understanding in English. We extend it to more languages with diverse scripts and writing systems, introducing EXECUTE. Our simplified framework allows easy expansion to any language. Tests across multiple LLMs reveal that challenges in other languages are not always on the character level as in English. Some languages show word-level processing issues, some show no issues at all. We also examine sub-character tasks in Chinese, Japanese, and Korean to assess LLMs' understanding of character components.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes