William Hunt

74.5ROApr 13Code

MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation

Toby Godfrey, William Hunt, Mohammad D. Soorati

Multi-agent reinforcement learning is a key method for training multi-robot systems. Through rewarding or punishing robots over a series of episodes according to their performance, they can be trained and then deployed in the real world. However, poorly trained policies can lead to unsafe behaviour during early training stages. We introduce Multi-Agent Reinforcement Learning guided by language-based Inter-robot Negotiation (MARLIN), a hybrid framework in which large language models provide high-level planning before the reinforcement learning policy has learned effective behaviours. Robots use language models to negotiate actions and generate plans that guide policy learning. The system dynamically switches between reinforcement learning and language-model-based negotiation during training, enabling safer and more effective exploration. MARLIN is evaluated using both simulated and physical robots with local and remote language models. Results show that, compared to standard multi-agent reinforcement learning, the hybrid approach achieves higher performance in early training without reducing final performance. The code is available at https://github.com/SooratiLab/MARLIN.

2.0ROApr 23

Effects of Swarm Size Variability on Operator Workload

William Hunt, Aleksandra Landowska, Horia A. Maior et al.

Real-world deployments of human--swarm teams depend on balancing operator workload to leverage human strengths without inducing overload. A key challenge is that swarm size is often dynamic: robots may join or leave the mission due to failures or redeployment, causing abrupt workload fluctuations. Understanding how such changes affect human workload and performance is critical for robust human--swarm interaction design. This paper investigates how the magnitude and direction of changes in swarm size influence operator workload. Drawing on the concept of workload history, we test three hypotheses: (1) workload remains elevated following decreases in swarm size, (2) small increases are more manageable than large jumps, and (3) sufficiently large changes override these effects by inducing a cognitive reset. We conducted two studies (N = 34) using a monitoring task with simulated drone swarms of varying sizes. By varying the swarm size between episodes, we measured perceived workload relative to swarm size changes. Results show that objective performance is largely unaffected by small changes in swarm size, while subjective workload is sensitive to both change direction and magnitude. Small increases preserve lower workload, whereas small decreases leave workload elevated, indicating workload residue; large changes in either direction attenuate these effects, suggesting a reset response. These findings offer actionable guidance for managing swarm-size transitions to support operator workload in dynamic human--swarm systems.

William Hunt

2 Papers