Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking
This addresses the challenge of using LLM-generated rationales to support subjective decisions, such as in debate assistance, but is incremental as it builds on existing work on rationale generation.
The paper tackled the problem of evaluating the persuasiveness of free-text rationales generated by LLMs for subjective tasks, specifically pairwise argument ranking, and found that open-source models like Llama2-70B-chat can outperform GPT models in providing persuasive rationales, with improvements possible through prompting or self-refinement.
Generating free-text rationales is among the emergent capabilities of Large Language Models (LLMs). These rationales have been found to enhance LLM performance across various NLP tasks. Recently, there has been growing interest in using these rationales to provide insights for various important downstream tasks. In this paper, we analyze generated free-text rationales in tasks with subjective answers, emphasizing the importance of rationalization in such scenarios. We focus on pairwise argument ranking, a highly subjective task with significant potential for real-world applications, such as debate assistance. We evaluate the persuasiveness of rationales generated by nine LLMs to support their subjective choices. Our findings suggest that open-source LLMs, particularly Llama2-70B-chat, are capable of providing highly persuasive rationalizations, surpassing even GPT models. Additionally, our experiments show that rationale persuasiveness can be improved by controlling its parameters through prompting or through self-refinement.