Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language
This work addresses the problem of understanding LLMs' persuasive capabilities for researchers and practitioners concerned with AI-generated influence, though it is incremental by building on prior domain-specific studies.
The study measured and benchmarked the ability of large language models (LLMs) to generate persuasive text across various domains, finding that different 'personas' in LLaMA3's system prompt significantly change persuasive language, even during paraphrasing.
We are exposed to much information trying to influence us, such as teaser messages, debates, politically framed news, and propaganda - all of which use persuasive language. With the recent interest in Large Language Models (LLMs), we study the ability of LLMs to produce persuasive text. As opposed to prior work which focuses on particular domains or types of persuasion, we conduct a general study across various domains to measure and benchmark to what degree LLMs produce persuasive language - both when explicitly instructed to rewrite text to be more or less persuasive and when only instructed to paraphrase. We construct the new dataset Persuasive-Pairs of pairs of a short text and its rewrite by an LLM to amplify or diminish persuasive language. We multi-annotate the pairs on a relative scale for persuasive language: a valuable resource in itself, and for training a regression model to score and benchmark persuasive language, including for new LLMs across domains. In our analysis, we find that different 'personas' in LLaMA3's system prompt change persuasive language substantially, even when only instructed to paraphrase.