CL AI LGSep 18, 2023

Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark

Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu

arXiv:2309.09602v10.5h-index: 74

Originality Synthesis-oriented

AI Analysis

This addresses a gap in natural language understanding for Chinese language processing, though it is incremental as it builds on existing classification methods with a new dataset and evaluation.

The paper tackles the problem of classifying Chinese propositions, which often lack explicit logical connectives, by introducing explicit and implicit proposition concepts and creating a multi-level classification system. They built a large-scale dataset PEACE and evaluated models like BERT and ChatGPT, finding that BERT performs well but lacks cross-domain transferability, while ChatGPT's performance improves with more proposition information.

Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifying these propositions is crucial for natural language understanding and reasoning. In this paper, we put forward the concepts of explicit and implicit propositions and propose a comprehensive multi-level proposition classification system based on linguistics and logic. Correspondingly, we create a large-scale Chinese proposition dataset PEACE from multiple domains, covering all categories related to propositions. To evaluate the Chinese proposition classification ability of existing models and explore their limitations, We conduct evaluations on PEACE using several different methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT. Results show the importance of properly modeling the semantic features of propositions. BERT has relatively good proposition classification capability, but lacks cross-domain transferability. ChatGPT performs poorly, but its classification ability can be improved by providing more proposition information. Many issues are still far from being resolved and require further study.

View on arXiv PDF

Similar