Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective
This provides insights for AI researchers and cognitive scientists into interpreting language models, though it is incremental in applying existing cognitive frameworks.
The paper tackles the problem of understanding how language models operate by formulating a probabilistic cognitive model called the bounded pragmatic speaker, showing that RLHF-fine-tuned models resemble a fast-and-slow model of human thought.
How do language models "think"? This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker, which can characterize the operation of different variations of language models. Specifically, we demonstrate that large language models fine-tuned with reinforcement learning from human feedback (Ouyang et al., 2022) embody a model of thought that conceptually resembles a fast-and-slow model (Kahneman, 2011), which psychologists have attributed to humans. We discuss the limitations of reinforcement learning from human feedback as a fast-and-slow model of thought and propose avenues for expanding this framework. In essence, our research highlights the value of adopting a cognitive probabilistic modeling approach to gain insights into the comprehension, evaluation, and advancement of language models.