Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game
This work addresses the challenge of creating more flexible and communicative AI agents for advancing toward AGI, representing a domain-specific incremental improvement.
The paper tackles the problem of developing AI agents capable of strategic decision-making and communication by proposing Multi-agent Kahneman & Tversky's Optimization (MaKTO), which uses in-context interaction in the Werewolf game to refine models; it achieves a 61% average win rate, outperforming GPT-4o and two-stage RL agents by 23.0% and 10.9% relative improvements, and shows human-like performance with 60% wins against experts and 49% detectability in blind tests.
Achieving Artificial General Intelligence (AGI) requires AI agents that can not only make stratigic decisions but also engage in flexible and meaningful communication. Inspired by Wittgenstein's language game theory in Philosophical Investigations, we propose that language agents can learn through in-context interaction rather than traditional multi-stage frameworks that separate decision-making from language expression. Using Werewolf, a social deduction game that tests language understanding, strategic interaction, and adaptability, we develop the Multi-agent Kahneman & Tversky's Optimization (MaKTO). MaKTO engages diverse models in extensive gameplay to generate unpaired desirable and unacceptable responses, then employs KTO to refine the model's decision-making process. In 9-player Werewolf games, MaKTO achieves a 61% average win rate across various models, outperforming GPT-4o and two-stage RL agents by relative improvements of 23.0% and 10.9%, respectively. Notably, MaKTO also demonstrates human-like performance, winning 60% against expert players and showing only 49% detectability in Turing-style blind tests.