Many Minds from One Model: Bayesian-Inspired Transformers for Population Diversity
This addresses the need for more varied and robust AI systems, offering an incremental improvement over existing deterministic transformers.
The paper tackles the problem of deterministic transformer models lacking behavioral diversity by proposing Population Bayesian Transformers (B-Trans), which enable sampling diverse model instances from a single pre-trained LLM, resulting in superior response diversity and better task performance in zero-shot generation and RLVR experiments.
Despite their scale and success, modern transformers are usually trained as single-minded systems: optimization produces a deterministic set of parameters, representing a single functional hypothesis about the data. Motivated by the analogy to human populations, in which population-level intelligence emerges from diverse individual behaviors, we propose Population Bayesian Transformers (B-Trans), which enable sampling diverse yet coherent transformer large language model instances (hereafter referred to as a 'mind') from a single pre-trained LLM. B-Trans introduces a Bayesian-inspired posterior proxy by injecting stochasticity directly into normalization layers, avoiding the prohibitive cost of training full Bayesian neural networks. Sampling from this proxy yields a population of minds with diverse behaviors while maintaining general competence. During the generation of each response, we sample a single realization from the random distribution and hold it fixed, ensuring temporal consistency and reasoning coherence. Experiments on zero-shot generation and Reinforcement Learning with Verifiable Rewards (RLVR) demonstrate that B-Trans effectively leverages the stochastic model diversity, yielding superior response diversity while achieving better task performance compared to deterministic baselines.