CL AIApr 22, 2024

A Survey on the Real Power of ChatGPT

Ming Liu, Ran Liu, Ye Zhu, Hua Wang, Youyang Qu, Rongsheng Li, Yongpan Sheng, Wray Buntine

arXiv:2405.00704v23 citationsh-index: 28

Originality Synthesis-oriented

AI Analysis

It provides a critical review for researchers to avoid being misled by superficial results, but is incremental as it synthesizes existing findings without new experiments.

This paper surveys recent studies to uncover the real performance levels of ChatGPT across seven NLP task categories, addressing challenges in evaluation due to its closed-source nature and potential data contamination.

ChatGPT has changed the AI community and an active research line is the performance evaluation of ChatGPT. A key challenge for the evaluation is that ChatGPT is still closed-source and traditional benchmark datasets may have been used by ChatGPT as the training data. In this paper, (i) we survey recent studies which uncover the real performance levels of ChatGPT in seven categories of NLP tasks, (ii) review the social implications and safety issues of ChatGPT, and (iii) emphasize key challenges and opportunities for its evaluation. We hope our survey can shed some light on its blackbox manner, so that researchers are not misleaded by its surface generation.

View on arXiv PDF

Similar