CL AIJan 16, 2024

Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

Linghan Zheng, Hui Liu, Xiaojun Lin, Jiayuan Dong, Yue Sheng, Gang Shi, Zhiwei Liu, Hongwei Chen

arXiv:2401.10286v31.0

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of improving QA pair extraction for Chinese language tasks, but it is incremental as it builds on known advantages of code-based models in reasoning scenarios.

The study found that code-based English models perform exceptionally well on Chinese QA pair extraction tasks, with those containing Chinese data achieving even better performance, offering a new perspective on the 'Chinese Room' thought experiment.

In previous studies, code-based models have consistently outperformed text-based models in reasoning-intensive scenarios. When generating our knowledge base for Retrieval-Augmented Generation (RAG), we observed that code-based models also perform exceptionally well in Chinese QA Pair Extraction task. Further, our experiments and the metrics we designed discovered that code-based models containing a certain amount of Chinese data achieve even better performance. Additionally, the capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.

View on arXiv PDF

Similar