CLAIHCAug 16, 2024

When Prompting Fails to Sway: Inertia in Moral and Value Judgments of Large Language Models

arXiv:2408.09049v26 citationsh-index: 4
AI Analysis

This reveals internal biases in LLMs that could affect equitable applications, but it is incremental as it builds on existing prompting methods.

The study tackled the problem of LLMs maintaining consistent value orientations despite persona-based prompting, finding that moral dimensions like harm avoidance and fairness remain skewed in one direction, with inertia observed across varied settings.

Large Language Models (LLMs) exhibit non-deterministic behavior, and prompting has emerged as a primary method for steering their outputs toward desired directions. One popular strategy involves assigning a specific "persona" to the model to induce more varied and context-sensitive responses, akin to the diversity found in human perspectives. However, contrary to the expectation that persona-based prompting would yield a wide range of opinions, our experiments demonstrate that LLMs maintain consistent value orientations. In particular, we observe a persistent inertia in their responses, where certain moral and value dimensions, especially harm avoidance and fairness, remain distinctly skewed in one direction despite varied persona settings. To investigate this phenomenon systematically, use role-play at scale, which combines randomized, diverse persona prompts with a macroscopic trend analysis of model outputs. Our findings highlight the strong internal biases and value preferences in LLMs, underscoring the need for careful scrutiny and potential adjustment of these models to ensure balanced and equitable applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes