CLAIApr 27, 2023

ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time

Tsinghua
arXiv:2304.14106v211 citationsh-index: 30Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for periodical and fine-grained evaluation of ChatGPT for researchers and practitioners, though it is incremental as it builds on existing benchmarks.

The paper tackles the problem of evaluating ChatGPT's performance over time by constructing ChatLog, an ever-updating dataset with large-scale records for 21 NLP benchmarks, finding that most capabilities improve except for some abilities and revealing a step-wise evolving pattern.

ChatGPT has achieved great success and can be considered to have acquired an infrastructural status. There are abundant works for evaluating ChatGPT on benchmarks. However, existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features. In this paper, we construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now. We conduct a comprehensive performance evaluation to find that most capabilities of ChatGPT improve over time except for some abilities, and there exists a step-wise evolving pattern of ChatGPT. We further analyze the inherent characteristics of ChatGPT by extracting the knowledge and linguistic features. We find some stable features that stay unchanged and apply them on the detection of ChatGPT-generated texts to improve the robustness of cross-version detection. We will continuously maintain our project at \url{https://github.com/THU-KEG/ChatLog/}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes