Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

arXiv:2604.2762469.8

AI Analysis

For researchers studying LLM bias and social alignment, this provides a large-scale, validated dataset linking prompts, language, and stances across diverse personas and topics.

The paper introduces Cognitive Digital Shadows (CDS), a 190,000-record synthetic corpus of LLM-generated discourse on controversial societal topics, created by prompting 19 LLMs with human personas or AI-assistant roles. The corpus enables analysis of how LLM outputs vary with social and contextual factors, supporting bias and alignment audits.

Large Language Models (LLMs) can strongly shape social discourse, yet datasets investigating how LLM outputs vary across controlled social and contextual prompting remain sparse. Cognitive Digital Shadows (CDS) is a 190,000-record synthetic corpus supporting analyses of LLM-generated discourse. Each CDS record is generated by one of 19 LLMs, prompted to shadow either a human persona or an AI-assistant role. CDS contains LLM responses on 4 controversial societal topics: vaccines/healthcare, social media disinformation, the gender gap in science, and STEM stereotypes. Persona-conditioned records encode 17 sociodemographic and psychological attributes, providing data linking LLMs' prompts, language, stances and reasoning. Texts are validated for topic anchoring and can support emotional analyses via interpretable NLP (e.g. textual forma mentis networks). CDS is enriched by a pooling platform with user-friendly dashboards, enabling easy, interactive group-level comparisons of emotional and semantic framing across personas, topics and models. The CDS prompting framework supports future audits of LLMs' bias, social sensitivity and alignment.

View on arXiv PDF

Similar