How Much Do LLMs Know About Chinese Zero Pronouns?
This work highlights a significant limitation for LLMs in processing Chinese, specifically regarding zero pronouns, which impacts the accuracy of Chinese NLP applications.
This paper investigates how well LLMs handle Chinese Zero Pronouns (ZPs) across various tasks, including identification, classification, resolution, and translation. The study found that LLMs struggle significantly with ZPs, especially in upstream tasks, and even state-of-the-art models correctly translate fewer than half of Chinese ZPs into English.
Zero Pronouns (ZPs) are a pervasive linguistic phenomenon in pro-drop languages such as Chinese and have long posed a challenge for natural language processing systems. Although Large Language Models (LLMs) perform well on many Chinese language tasks, their ability to process ZPs remains poorly understood. We conduct a systematic investigation of LLMs' handling of Chinese ZPs through a sequence of linguistically motivated tasks, including identification, referentiality classification, referential type classification, resolution, and translation. A diverse set of LLMs is evaluated across all tasks. Our results show that Chinese ZPs remain highly challenging for current LLMs, particularly for upstream tasks such as identification and referentiality classification. Performance on downstream tasks, such as ZP translation, is also consistently low: even state-of-the-art reasoning-oriented LLMs correctly translate fewer than half of Chinese ZPs into English.