CLCYOct 18, 2023

Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models

arXiv:2310.12049v36 citationsh-index: 46
Originality Incremental advance
AI Analysis

This addresses the challenge of scoring short texts, such as political tweets, for researchers studying democratic backsliding, though it is incremental as it builds on existing LLM and prompting techniques.

The authors tackled the problem of scoring short texts without large labeled datasets by developing a concept-guided chain-of-thought prompting method that uses large language models to compare texts pairwise, achieving stronger correlations with human judgments than unsupervised methods like Wordfish and matching the performance of fine-tuned RoBERTa-Large on thousands of labeled tweets.

Existing text scoring methods require a large corpus, struggle with short texts, or require hand-labeled data. We develop a text scoring framework that leverages generative large language models (LLMs) to (1) set texts against the backdrop of information from the near-totality of the web and digitized media, and (2) effectively transform pairwise text comparisons from a reasoning problem to a pattern recognition task. Our approach, concept-guided chain-of-thought (CGCoT), utilizes a chain of researcher-designed prompts with an LLM to generate a concept-specific breakdown for each text, akin to guidance provided to human coders. We then pairwise compare breakdowns using an LLM and aggregate answers into a score using a probability model. We apply this approach to better understand speech reflecting aversion to specific political parties on Twitter, a topic that has commanded increasing interest because of its potential contributions to democratic backsliding. We achieve stronger correlations with human judgments than widely used unsupervised text scoring methods like Wordfish. In a supervised setting, besides a small pilot dataset to develop CGCoT prompts, our measures require no additional hand-labeled data and produce predictions on par with RoBERTa-Large fine-tuned on thousands of hand-labeled tweets. This project showcases the potential of combining human expertise and LLMs for scoring tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes