LGMAAug 12, 2025

Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning

arXiv:2508.09275v1h-index: 17
Originality Incremental advance
AI Analysis

This addresses security risks for sensitive real-world applications of multi-agent reinforcement learning, though it is incremental by focusing on more constrained attack scenarios.

The paper tackles the problem of adversarial vulnerabilities in collaborative multi-agent reinforcement learning under realistic constraints, such as perturbing observations without access to policy weights, and demonstrates effectiveness across 22 environments with sample efficiency requiring only 1,000 samples.

Collaborative multi-agent reinforcement learning (c-MARL) has rapidly evolved, offering state-of-the-art algorithms for real-world applications, including sensitive domains. However, a key challenge to its widespread adoption is the lack of a thorough investigation into its vulnerabilities to adversarial attacks. Existing work predominantly focuses on training-time attacks or unrealistic scenarios, such as access to policy weights or the ability to train surrogate policies. In this paper, we investigate new vulnerabilities under more realistic and constrained conditions, assuming an adversary can only collect and perturb the observations of deployed agents. We also consider scenarios where the adversary has no access at all. We propose simple yet highly effective algorithms for generating adversarial perturbations designed to misalign how victim agents perceive their environment. Our approach is empirically validated on three benchmarks and 22 environments, demonstrating its effectiveness across diverse algorithms and environments. Furthermore, we show that our algorithm is sample-efficient, requiring only 1,000 samples compared to the millions needed by previous methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes