Coordinated Online Learning for Multi-Agent Systems with Coupled Constraints and Perturbed Utility Observations
This work addresses resource management in multi-agent systems, offering a novel decentralized approach with theoretical guarantees, though it is incremental in extending existing methods to handle noisy feedback.
The paper tackles the problem of coordinating competitive online learning agents with coupled resource constraints and noisy utility feedback, introducing a decentralized resource pricing method that ensures convergence to a generalized Nash equilibrium and provides a finite-time bound on constraint violations.
Competitive non-cooperative online decision-making agents whose actions increase congestion of scarce resources constitute a model for widespread modern large-scale applications. To ensure sustainable resource behavior, we introduce a novel method to steer the agents toward a stable population state, fulfilling the given coupled resource constraints. The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian. Assuming that the online learning agents have only noisy first-order utility feedback, we show that for a polynomially decaying agents' step size/learning rate, the population's dynamic will almost surely converge to generalized Nash equilibrium. A particular consequence of the latter is the fulfillment of resource constraints in the asymptotic limit. Moreover, we investigate the finite-time quality of the proposed algorithm by giving a nonasymptotic time decaying bound for the expected amount of resource constraint violation.