AILGMAJan 24, 2025

Distributed Multi-Agent Coordination Using Multi-Modal Foundation Models

arXiv:2501.14189v14 citationsh-index: 49
Originality Incremental advance
AI Analysis

This work addresses the problem of automating multi-agent coordination for researchers and practitioners in AI, though it appears incremental by building on existing DCOP frameworks with foundation models.

The paper tackles the labor-intensive manual problem construction in Distributed Constraint Optimization Problems (DCOPs) by introducing VL-DCOPs, a framework that uses large multimodal foundation models to automatically generate constraints from visual and linguistic instructions, and evaluates neuro-symbolic and fully neural agent archetypes on three novel tasks.

Distributed Constraint Optimization Problems (DCOPs) offer a powerful framework for multi-agent coordination but often rely on labor-intensive, manual problem construction. To address this, we introduce VL-DCOPs, a framework that takes advantage of large multimodal foundation models (LFMs) to automatically generate constraints from both visual and linguistic instructions. We then introduce a spectrum of agent archetypes for solving VL-DCOPs: from a neuro-symbolic agent that delegates some of the algorithmic decisions to an LFM, to a fully neural agent that depends entirely on an LFM for coordination. We evaluate these agent archetypes using state-of-the-art LLMs (large language models) and VLMs (vision language models) on three novel VL-DCOP tasks and compare their respective advantages and drawbacks. Lastly, we discuss how this work extends to broader frontier challenges in the DCOP literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes