CLCVMar 26, 2024

Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies

arXiv:2403.17497v181 citationsh-index: 14LREC
Originality Incremental advance
AI Analysis

This work addresses the problem of cost-sharing in collaborative interactions for AI agents, though it appears incremental as it builds on existing methods with room for improvement compared to heuristic pairings.

The authors tackled the problem of evaluating and learning collaborative multi-agent instruction giving and following policies by proposing an interactive reference game that requires coordination on vision and language observations, with results showing that a PPO setup achieves a high success rate when bootstrapped with heuristic behaviors and that neural partners reduce joint effort when playing together repeatedly.

In collaborative goal-oriented settings, the participants are not only interested in achieving a successful outcome, but do also implicitly negotiate the effort they put into the interaction (by adapting to each other). In this work, we propose a challenging interactive reference game that requires two players to coordinate on vision and language observations. The learning signal in this game is a score (given after playing) that takes into account the achieved goal and the players' assumed efforts during the interaction. We show that a standard Proximal Policy Optimization (PPO) setup achieves a high success rate when bootstrapped with heuristic partner behaviors that implement insights from the analysis of human-human interactions. And we find that a pairing of neural partners indeed reduces the measured joint effort when playing together repeatedly. However, we observe that in comparison to a reasonable heuristic pairing there is still room for improvement -- which invites further research in the direction of cost-sharing in collaborative interactions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes