AIDec 20, 2022

Identifying and Manipulating the Personality Traits of Language Models

arXiv:2212.10276v152 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the problem of controlling persona consistency in AI systems for developers and researchers, though it is incremental as it builds on existing models and psychological frameworks.

The paper investigates whether language models like BERT and GPT2 consistently exhibit and can be controlled to reflect personality traits, such as the Big Five, in their language generation, showing they can be manipulated predictably for applications like dialog systems.

Psychology research has long explored aspects of human personality such as extroversion, agreeableness and emotional stability. Categorizations like the `Big Five' personality traits are commonly used to assess and diagnose personality types. In this work, we explore the question of whether the perceived personality in language models is exhibited consistently in their language generation. For example, is a language model such as GPT2 likely to respond in a consistent way if asked to go out to a party? We also investigate whether such personality traits can be controlled. We show that when provided different types of contexts (such as personality descriptions, or answers to diagnostic questions about personality traits), language models such as BERT and GPT2 can consistently identify and reflect personality markers in those contexts. This behavior illustrates an ability to be manipulated in a highly predictable way, and frames them as tools for identifying personality traits and controlling personas in applications such as dialog systems. We also contribute a crowd-sourced data-set of personality descriptions of human subjects paired with their `Big Five' personality assessment data, and a data-set of personality descriptions collated from Reddit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes