CLSep 4, 2025

Can Language Models Handle a Non-Gregorian Calendar? The Case of the Japanese wareki

arXiv:2509.04432v32 citationsh-index: 5IJCNLP-AACL
Originality Synthesis-oriented
AI Analysis

This addresses the need for language models to better support culture-specific tasks like calendar understanding, which is important for users in regions using non-Gregorian systems, though it is incremental as it focuses on evaluation rather than new methods.

The paper tackled the problem of evaluating language models' ability to handle non-Gregorian calendars, specifically the Japanese wareki, and found that models like GPT-4o and Deepseek V3 struggle with calendar arithmetic and knowledge, with error analysis pointing to corpus frequency and Gregorian bias as issues.

Temporal reasoning and knowledge are essential capabilities for language models (LMs). While much prior work has analyzed and improved temporal reasoning in LMs, most studies have focused solely on the Gregorian calendar. However, many non-Gregorian systems, such as the Japanese, Hijri, and Hebrew calendars, are in active use and reflect culturally grounded conceptions of time. If and how well current LMs can accurately handle such non-Gregorian calendars has not been evaluated so far. Here, we present a systematic evaluation of how well language models handle one such non-Gregorian system: the Japanese wareki. We create datasets that require temporal knowledge and reasoning in using wareki dates. Evaluating open and closed LMs, we find that some models can perform calendar conversions, but GPT-4o, Deepseek V3, and even Japanese-centric models struggle with Japanese calendar arithmetic and knowledge involving wareki dates. Error analysis suggests corpus frequency of Japanese calendar expressions and a Gregorian bias in the model's knowledge as possible explanations. Our results show the importance of developing LMs that are better equipped for culture-specific tasks such as calendar understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes