CLAIOct 16, 2024

ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors

arXiv:2410.13057v111 citationsh-index: 2NAACL
Originality Synthesis-oriented
AI Analysis

This work addresses a specific robustness issue in Chinese NLP for researchers and practitioners, though it is incremental as it focuses on evaluating existing models rather than proposing new solutions.

The paper tackles the vulnerability of Chinese NLP models to morphological garden path errors, showing that word segmentation models make errors on locally ambiguous sentences but not on unambiguous ones, and sentiment analysis models with character-level tokenization also exhibit these errors.

In languages without orthographic word boundaries, NLP models perform word segmentation, either as an explicit preprocessing step or as an implicit step in an end-to-end computation. This paper shows that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context. We propose a benchmark, ERAS, that tests a model's vulnerability to morphological garden path errors by comparing its behavior on sentences with and without local segmentation ambiguities. Using ERAS, we show that word segmentation models make garden path errors on locally ambiguous sentences, but do not make equivalent errors on unambiguous sentences. We further show that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline. Our results indicate that models' segmentation of Chinese text often fails to account for morphosyntactic context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes