Navigating the Prompt Space: Improving LLM Classification of Social Science Texts Through Prompt Engineering
This work addresses the need for improved and cost-effective text classification in social sciences, but it is incremental as it builds on existing prompt engineering methods without introducing a new paradigm.
The paper tackled the problem of maximizing LLM performance for social science text classification by systematically varying prompt engineering aspects, finding that minimal increases in prompt context yield the highest performance gains, with further increases often leading to marginal or decreased accuracy.
Recent developments in text classification using Large Language Models (LLMs) in the social sciences suggest that costs can be cut significantly, while performance can sometimes rival existing computational methods. However, with a wide variance in performance in current tests, we move to the question of how to maximize performance. In this paper, we focus on prompt context as a possible avenue for increasing accuracy by systematically varying three aspects of prompt engineering: label descriptions, instructional nudges, and few shot examples. Across two different examples, our tests illustrate that a minimal increase in prompt context yields the highest increase in performance, while further increases in context only tend to yield marginal performance increases thereafter. Alarmingly, increasing prompt context sometimes decreases accuracy. Furthermore, our tests suggest substantial heterogeneity across models, tasks, and batch size, underlining the need for individual validation of each LLM coding task rather than reliance on general rules.