HCAIMar 22, 2023

Generate labeled training data using Prompt Programming and GPT-3. An example of Big Five Personality Classification

arXiv:2303.12279v13 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the data scarcity problem for personality classification researchers, though it is incremental as it applies existing methods to new data.

The researchers tackled the problem of generating labeled training data for Big Five personality classification by using prompt programming with GPT-3 to create 25,000 labeled conversations, then trained classification models achieving 0.71 accuracy on generated data and 0.65 on real datasets.

We generated 25000 conversations labeled with Big Five Personality traits using prompt programming at GPT-3. Then we train Big Five classification models with these data and evaluate them with 2500 data from generated dialogues and real conversational datasets labeled in Big Five by human annotators. The results indicated that this approach is promising for creating effective training data. We then compare the performance by different training approaches and models. Our results suggest that using Adapter-Transformers and transfer learning from pre-trained RoBERTa sentiment analysis model will perform best with the generated data. Our best model obtained an accuracy of 0.71 in generated data and 0.65 in real datasets. Finally, we discuss this approach's potential limitations and confidence metric.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes