LGAIETMASYNov 2, 2024

Interacting Large Language Model Agents. Interpretable Models and Social Learning

arXiv:2411.01271v21 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of mitigating harmful behaviors in interacting AI agents for applications like online platforms, though it builds incrementally on existing microeconomics and signal processing methods.

The paper tackles the problem of understanding and controlling interacting large language model agents (LLMAs) that exhibit bias and herding behavior during social learning. It shows LLMAs act as rationally bounded Bayesian agents, proposes interpretable models capturing herding, and develops stochastic control methods that improve state estimation accuracy on real datasets like hate speech classification and product quality assessment.

This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under 2 settings: (a) centrally controlled LLMAs (b) autonomous LLMAs with incentives. We demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and closed-source models like ChatGPT. The main takeaway of this paper, based on empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes