LG AI HCOct 7, 2025

LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams

arXiv:2510.06151v14.11 citationsh-index: 2ECAI

Originality Incremental advance

AI Analysis

This provides a scalable method for simulating human-like teammates in heterogeneous-agent teams, though it is incremental as it builds on existing LLM capabilities for proxy design.

The paper tackles the challenge of training agents to collaborate with inaccessible or non-stationary teammates like humans by proposing LLMs as policy-agnostic human proxies to generate synthetic data mimicking human decision-making. Results show LLMs align closely with experts in decision criteria, mirror human variability in risk-sensitive strategies, and produce trajectories resembling human paths in a grid-world game.

A critical challenge in modelling Heterogeneous-Agent Teams is training agents to collaborate with teammates whose policies are inaccessible or non-stationary, such as humans. Traditional approaches rely on expensive human-in-the-loop data, which limits scalability. We propose using Large Language Models (LLMs) as policy-agnostic human proxies to generate synthetic data that mimics human decision-making. To evaluate this, we conduct three experiments in a grid-world capture game inspired by Stag Hunt, a game theory paradigm that balances risk and reward. In Experiment 1, we compare decisions from 30 human participants and 2 expert judges with outputs from LLaMA 3.1 and Mixtral 8x22B models. LLMs, prompted with game-state observations and reward structures, align more closely with experts than participants, demonstrating consistency in applying underlying decision criteria. Experiment 2 modifies prompts to induce risk-sensitive strategies (e.g. "be risk averse"). LLM outputs mirror human participants' variability, shifting between risk-averse and risk-seeking behaviours. Finally, Experiment 3 tests LLMs in a dynamic grid-world where the LLM agents generate movement actions. LLMs produce trajectories resembling human participants' paths. While LLMs cannot yet fully replicate human adaptability, their prompt-guided diversity offers a scalable foundation for simulating policy-agnostic teammates.

View on arXiv PDF

Similar