CLMay 23, 2023

Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

Yida Mu, Ben P. Wu, William Thorne, Ambrose Robinson, Nikolaos Aletras, Carolina Scarton, Kalina Bontcheva, Xingyi Song

arXiv:2305.14310v318.488 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of optimizing prompt design for zero-shot classification in LLMs, particularly for social science researchers, but it is incremental as it builds on existing prompting studies.

The study evaluated zero-shot performance of ChatGPT and OpenAssistant on six Computational Social Science classification tasks, finding they underperformed compared to fine-tuned BERT-large, with prompting strategies causing accuracy and F1 score variations over 10%.

Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific prompts. However, due to the computational demands associated with training these models, their applications often adopt a zero-shot setting. In this paper, we evaluate the zero-shot performance of two publicly accessible LLMs, ChatGPT and OpenAssistant, in the context of six Computational Social Science classification tasks, while also investigating the effects of various prompting strategies. Our experiments investigate the impact of prompt complexity, including the effect of incorporating label definitions into the prompt; use of synonyms for label names; and the influence of integrating past memories during foundation model training. The findings indicate that in a zero-shot setting, current LLMs are unable to match the performance of smaller, fine-tuned baseline transformer models (such as BERT-large). Additionally, we find that different prompting strategies can significantly affect classification accuracy, with variations in accuracy and F1 scores exceeding 10\%.

View on arXiv PDF

Similar