LGGTJun 30, 2023

U-Calibration: Forecasting for an Unknown Agent

arXiv:2307.00168v143 citationsh-index: 62
Originality Highly original
AI Analysis

This work addresses a foundational issue in forecasting for unknown agents, offering a novel solution that improves upon traditional calibration methods.

The paper tackles the problem of evaluating forecasts for binary events when the utility of rational agents using the predictions is unknown, showing that optimizing for a single scoring rule fails to guarantee low regret for all agents, while calibrated forecasts do but with worse convergence rates. They introduce U-calibration as a new metric that ensures sublinear regret for all agents and provide an online algorithm achieving O(√T) error, matching optimal rates for single scoring rules.

We consider the problem of evaluating forecasts of binary events whose predictions are consumed by rational agents who take an action in response to a prediction, but whose utility is unknown to the forecaster. We show that optimizing forecasts for a single scoring rule (e.g., the Brier score) cannot guarantee low regret for all possible agents. In contrast, forecasts that are well-calibrated guarantee that all agents incur sublinear regret. However, calibration is not a necessary criterion here (it is possible for miscalibrated forecasts to provide good regret guarantees for all possible agents), and calibrated forecasting procedures have provably worse convergence rates than forecasting procedures targeting a single scoring rule. Motivated by this, we present a new metric for evaluating forecasts that we call U-calibration, equal to the maximal regret of the sequence of forecasts when evaluated under any bounded scoring rule. We show that sublinear U-calibration error is a necessary and sufficient condition for all agents to achieve sublinear regret guarantees. We additionally demonstrate how to compute the U-calibration error efficiently and provide an online algorithm that achieves $O(\sqrt{T})$ U-calibration error (on par with optimal rates for optimizing for a single scoring rule, and bypassing lower bounds for the traditionally calibrated learning procedures). Finally, we discuss generalizations to the multiclass prediction setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes