MALGMay 26

Cost of Structural Learning Under Censored Feedback: A Threshold-Bandit Approach

arXiv:2605.2707618.5
Predicted impact top 84% in MA · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of learning under censored feedback in multi-agent systems, which is relevant for applications like coalition formation and task allocation.

The paper introduces the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) problem, where rewards are only observed when a coalition meets an unknown size threshold, and presents centralized (C-TAC) and decentralized (D-TAC) algorithms. C-TAC achieves O(log T) regret, while D-TAC reduces communication by 23x compared to the centralized baseline while maintaining feasibility alignment.

In many multi-agent applications, tasks yield rewards only when executed by a coalition meeting an unknown size threshold; otherwise, feedback is fully censored. This censorship creates an identifiability problem: agents cannot distinguish stochastic failure from insufficient coordination. We formalize this setting as the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and analyze it under both centralized and decentralized coordination. We show that a centralized algorithm (C-TAC) achieves cumulative regret O(log T), decomposed into a structural-search term that captures the cost of resolving feasibility under censored feedback and a statistical-monitoring term for value estimation. We then introduce D-TAC, a decentralized event-triggered protocol in which agents synchronize only when their structural beliefs change. Empirically, D-TAC achieves a 23x reduction in communication relative to the centralized baseline while preserving feasibility alignment under conservative belief fusion. These results characterize the coordination cost of learning under censored feedback and show that near-centralized communication efficiency is achievable without continuous synchronization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes