LGGTMLSep 6, 2022

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

arXiv:2209.02838v18 citationsh-index: 97
Originality Incremental advance
AI Analysis

This work addresses risk management for agents in multi-agent games, but it is incremental as it builds on known algorithms with improvements in handling bandit feedback.

The paper tackles risk-averse learning in repeated unknown games by proposing a momentum-based algorithm that uses bandit feedback to estimate conditional value at risk (CVaR), achieving sub-linear regret and outperforming existing methods in numerical experiments on a Cournot game.

We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes