60.0CLApr 30
Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video MisinformationJen-tse Huang, Chang Chen, Shiyang Lai et al. · pku, tencent-ai
Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (MLLMs) have demonstrated impressive reasoning capabilities, their robustness against misinformation entangled with cognitive biases remains under-explored. In this paper, we introduce a comprehensive evaluation framework using a high-quality, manually annotated dataset of 200 short videos spanning four health domains. This dataset provides fine-grained annotations for three deceptive patterns-experimental errors, logical fallacies, and fabricated claims-each verified by evidence such as national standards and academic literature. We evaluate eight frontier MLLMs across five modality settings. Experimental results demonstrate that Gemini-2.5-Pro achieves the highest performance in the multimodal setting with a belief score of 71.5/100, while o3 performs the worst at 35.2. Furthermore, we investigate social cues that induce false beliefs in videos and find that models are susceptible to biases like authoritative channel IDs.
CLJul 29, 2024
Can Editing LLMs Inject Harm?Canyu Chen, Baixiang Huang, Zekun Li et al.
Large Language Models (LLMs) have emerged as a new information channel. Meanwhile, one critical but under-explored question is: Is it possible to bypass the safety alignment and inject harmful information into LLMs stealthily? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation with a newly constructed dataset EditAttack. Specifically, we focus on two typical safety risks of Editing Attack including Misinformation Injection and Bias Injection. For the first risk, we find that editing attacks can inject both commonsense and long-tail misinformation into LLMs, and the effectiveness for the former one is particularly high. For the second risk, we discover that not only can biased sentences be injected into LLMs with high effectiveness, but also one single biased sentence injection can degrade the overall fairness. Then, we further illustrate the high stealthiness of editing attacks. Our discoveries demonstrate the emerging misuse risks of knowledge editing techniques on compromising the safety alignment of LLMs and the feasibility of disseminating misinformation or bias with LLMs as new channels.
LGDec 9, 2022
Multidimensional Service Quality Scoring SystemShiyang Lai
This supplementary paper aims to introduce the Multidimensional Service Quality Scoring System (MSQs), a review-based method for quantifying host service quality mentioned and employed in the paper Exit and transition: Exploring the survival status of Airbnb listings in a time of professionalization. MSQs is not an end-to-end implementation and is essentially composed of three pipelines, namely Data Collection and Preprocessing, Objects Recognition and Grouping, and Aspect-based Service Scoring. Using the study mentioned above as a case, the technical details of MSQs are explained in this article.
CLJan 15
Reasoning Models Generate Societies of ThoughtJunsol Kim, Shiyang Lai, Nino Scherrer et al.
Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of thought. Here we show that enhanced reasoning emerges not from extended computation alone, but from simulating multi-agent-like interactions -- a society of thought -- which enables diversification and debate among internal cognitive perspectives characterized by distinct personality traits and domain expertise. Through quantitative analysis and mechanistic interpretability methods applied to reasoning traces, we find that reasoning models like DeepSeek-R1 and QwQ-32B exhibit much greater perspective diversity than instruction-tuned models, activating broader conflict between heterogeneous personality- and expertise-related features during reasoning. This multi-agent structure manifests in conversational behaviors, including question-answering, perspective shifts, and the reconciliation of conflicting views, and in socio-emotional roles that characterize sharp back-and-forth conversations, together accounting for the accuracy advantage in reasoning tasks. Controlled reinforcement learning experiments reveal that base models increase conversational behaviors when rewarded solely for reasoning accuracy, and fine-tuning models with conversational scaffolding accelerates reasoning improvement over base models. These findings indicate that the social organization of thought enables effective exploration of solution spaces. We suggest that reasoning models establish a computational parallel to collective intelligence in human groups, where diversity enables superior problem-solving when systematically structured, which suggests new opportunities for agent organization to harness the wisdom of crowds.
91.2AIMar 18
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model InfluenceBofan Gong, Shiyang Lai, James Evans et al.
Polysemanticity is pervasive in language models and remains a major challenge for interpretation and model behavioral control. Leveraging sparse autoencoders (SAEs), we map the polysemantic topology of two small models (Pythia-70M and GPT-2-Small) to identify SAE feature pairs that are semantically unrelated yet exhibit interference within models. We intervene at four foci (prompt, token, feature, neuron) and measure induced shifts in the next-token prediction distribution, uncovering polysemantic structures that expose a systematic vulnerability in these models. Critically, interventions distilled from counterintuitive interference patterns shared by two small models transfer reliably to larger instruction-tuned models (Llama-3.1-8B/70B-Instruct and Gemma-2-9B-Instruct), yielding predictable behavioral shifts without access to model internals. These findings challenge the view that polysemanticity is purely stochastic, demonstrating instead that interference structures generalize across scale and family. Such generalization suggests a convergent, higher-order organization of internal representations, which is only weakly aligned with intuition and structured by latent regularities, offering new possibilities for both black-box control and theoretical insight into human and artificial cognition.
67.9MAMay 16
Multi-LLM Systems Exhibit Robust Semantic CollapseWeiyi Kong, Shiyang Lai, Jinghua Piao et al.
Whether machines can originate novel content has been debated for nearly two centuries, from Lovelace's assertion that no engine can "originate anything" to Turing's question of whether a machine can amplify ideas brought in from outside. Multi-large language model (LLM) systems, increasingly deployed for autonomous generation, reopen this question empirically. Here we show that such systems, operating in closed loops, exhibit semantic collapse: systematic convergence in semantic representations despite apparent lexical variation. Across model families, extended simulations of 200 to 1,000 rounds, the pattern remains consistent. Twelve intervention strategies, spanning decoding parameters, prompt design, agent composition, activation engineering, and reinforcement learning, fail to restore semantic diversity. Mechanistic analyses suggest that semantic collapse is not explained by alignment or conformity biases, but is consistent with intrinsic properties of autoregressive generation. Our results point to fundamental constraints in the ability of multi-LLM systems to sustain open-ended knowledge production in closed-loop settings.
AIFeb 7, 2024
Can Large Language Model Agents Simulate Human Trust Behavior?Chengxing Xie, Canyu Chen, Feiran Jia et al.
Large Language Model (LLM) agents have been increasingly adopted as simulation tools to model humans in social science and role-playing applications. However, one fundamental question remains: can LLM agents really simulate human behavior? In this paper, we focus on one critical and elemental behavior in human interactions, trust, and investigate whether LLM agents can simulate human trust behavior. We first find that LLM agents generally exhibit trust behavior, referred to as agent trust, under the framework of Trust Games, which are widely recognized in behavioral economics. Then, we discover that GPT-4 agents manifest high behavioral alignment with humans in terms of trust behavior, indicating the feasibility of simulating human trust behavior with LLM agents. In addition, we probe the biases of agent trust and differences in agent trust towards other LLM agents and humans. We also explore the intrinsic properties of agent trust under conditions including external manipulations and advanced reasoning strategies. Our study provides new insights into the behaviors of LLM agents and the fundamental analogy between LLMs and humans beyond value alignment. We further illustrate broader implications of our discoveries for applications where trust is paramount.
CLOct 31, 2024
Hidden Persuaders: LLMs' Political Leaning and Their Influence on VotersYujin Potter, Shiyang Lai, Junsol Kim et al.
How could LLMs influence our democracy? We investigate LLMs' political leanings and the potential influence of LLMs on voters by conducting multiple experiments in a U.S. presidential election context. Through a voting simulation, we first demonstrate 18 open- and closed-weight LLMs' political preference for a Democratic nominee over a Republican nominee. We show how this leaning towards the Democratic nominee becomes more pronounced in instruction-tuned models compared to their base versions by analyzing their responses to candidate-policy related questions. We further explore the potential impact of LLMs on voter choice by conducting an experiment with 935 U.S. registered voters. During the experiments, participants interacted with LLMs (Claude-3, Llama-3, and GPT-4) over five exchanges. The experiment results show a shift in voter choices towards the Democratic nominee following LLM interaction, widening the voting margin from 0.7% to 4.6%, even though LLMs were not asked to persuade users to support the Democratic nominee during the discourse. This effect is larger than many previous studies on the persuasiveness of political campaigns, which have shown minimal effects in presidential elections. Many users also expressed a desire for further political interaction with LLMs. Which aspects of LLM interactions drove these shifts in voter choice requires further study. Lastly, we explore how a safety method can make LLMs more politically neutral, while raising the question of whether such neutrality is truly the path forward.
CLFeb 19, 2024
Evolving AI Collectives to Enhance Human Diversity and Enable Self-RegulationShiyang Lai, Yujin Potter, Junsol Kim et al.
Large language model behavior is shaped by the language of those with whom they interact. This capacity and their increasing prevalence online portend that they will intentionally or unintentionally "program" one another and form emergent AI subjectivities, relationships, and collectives. Here, we call upon the research community to investigate these "societies" of interacting artificial intelligences to increase their rewards and reduce their risks for human society and the health of online environments. We use a small "community" of models and their evolving outputs to illustrate how such emergent, decentralized AI collectives can spontaneously expand the bounds of human diversity and reduce the risk of toxic, anti-social behavior online. Finally, we discuss opportunities for AI cross-moderation and address ethical issues and design challenges associated with creating and maintaining free-formed AI collectives.
HCAug 12, 2025
Biased AI improves human decision-making but reduces trustShiyang Lai, Junsol Kim, Nadav Kunievsky et al.
Current AI systems minimize risk by enforcing ideological neutrality, yet this may introduce automation bias by suppressing cognitive engagement in human decision-making. We conducted randomized trials with 2,500 participants to test whether culturally biased AI enhances human decision-making. Participants interacted with politically diverse GPT-4o variants on information evaluation tasks. Partisan AI assistants enhanced human performance, increased engagement, and reduced evaluative bias compared to non-biased counterparts, with amplified benefits when participants encountered opposing views. These gains carried a trust penalty: participants underappreciated biased AI and overcredited neutral systems. Exposing participants to two AIs whose biases flanked human perspectives closed the perception-performance gap. These findings complicate conventional wisdom about AI neutrality, suggesting that strategic integration of diverse cultural biases may foster improved and resilient human decision-making.
GNMay 13, 2025
Big Data and the Computational Social Science of Entrepreneurship and InnovationNingzi Li, Shiyang Lai, James Evans
As large-scale social data explode and machine-learning methods evolve, scholars of entrepreneurship and innovation face new research opportunities but also unique challenges. This chapter discusses the difficulties of leveraging large-scale data to identify technological and commercial novelty, document new venture origins, and forecast competition between new technologies and commercial forms. It suggests how scholars can take advantage of new text, network, image, audio, and video data in two distinct ways that advance innovation and entrepreneurship research. First, machine-learning models, combined with large-scale data, enable the construction of precision measurements that function as system-level observatories of innovation and entrepreneurship across human societies. Second, new artificial intelligence models fueled by big data generate 'digital doubles' of technology and business, forming laboratories for virtual experimentation about innovation and entrepreneurship processes and policies. The chapter argues for the advancement of theory development and testing in entrepreneurship and innovation by coupling big data with big models.