CLMar 14, 2024

AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

arXiv:2403.09097v16 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient data annotation in AI research, offering an automated solution that reduces reliance on human experts, though it is incremental as it builds on existing chatbot and model fine-tuning methods.

The paper tackled the problem of costly expert annotation for identifying AI research publications by using GPT models as automated annotators, achieving 94% accuracy in labeling AI publications and training a classifier on GPT-labeled data that outperformed a baseline by 9 percentage points with 82% accuracy.

Identifying scientific publications that are within a dynamic field of research often requires costly annotation by subject-matter experts. Resources like widely-accepted classification criteria or field taxonomies are unavailable for a domain like artificial intelligence (AI), which spans emerging topics and technologies. We address these challenges by inferring a functional definition of AI research from existing expert labels, and then evaluating state-of-the-art chatbot models on the task of expert data annotation. Using the arXiv publication database as ground-truth, we experiment with prompt engineering for GPT chatbot models to identify an alternative, automated expert annotation pipeline that assigns AI labels with 94% accuracy. For comparison, we fine-tune SPECTER, a transformer language model pre-trained on scientific publications, that achieves 96% accuracy (only 2% higher than GPT) on classifying AI publications. Our results indicate that with effective prompt engineering, chatbots can be used as reliable data annotators even where subject-area expertise is required. To evaluate the utility of chatbot-annotated datasets on downstream classification tasks, we train a new classifier on GPT-labeled data and compare its performance to the arXiv-trained model. The classifier trained on GPT-labeled data outperforms the arXiv-trained model by nine percentage points, achieving 82% accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes