CLMar 16, 2025

HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs

arXiv:2503.12440v28 citationsh-index: 5Has CodeProceedings of the 29th Conference on Computational Natural Language Learning
Originality Synthesis-oriented
AI Analysis

This addresses a gap in NLP evaluation for Cantonese speakers and researchers, though it is incremental as it focuses on creating a new benchmark rather than advancing model capabilities.

The paper tackles the lack of evaluation datasets for Cantonese language understanding by introducing HKCanto-Eval, a benchmark that assesses large language models on tasks incorporating Hong Kong's cultural nuances, finding that proprietary models outperform open-weight ones but still have significant limitations in handling Cantonese-specific knowledge.

The ability of language models to comprehend and interact in diverse linguistic and cultural landscapes is crucial. The Cantonese language used in Hong Kong presents unique challenges for natural language processing due to its rich cultural nuances and lack of dedicated evaluation datasets. The HKCanto-Eval benchmark addresses this gap by evaluating the performance of large language models (LLMs) on Cantonese language understanding tasks, extending to English and Written Chinese for cross-lingual evaluation. HKCanto-Eval integrates cultural and linguistic nuances intrinsic to Hong Kong, providing a robust framework for assessing language models in realistic scenarios. Additionally, the benchmark includes questions designed to tap into the underlying linguistic metaknowledge of the models. Our findings indicate that while proprietary models generally outperform open-weight models, significant limitations remain in handling Cantonese-specific linguistic and cultural knowledge, highlighting the need for more targeted training data and evaluation methods. The code can be accessed at https://github.com/hon9kon9ize/hkeval2025

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes