Proper Calibeating
For researchers in forecasting and decision theory, this work provides a theoretical foundation for ensuring forecasts are calibrated across all proper scoring rules, which is a significant but incremental extension of existing calibration concepts.
The paper extends the concepts of calibration and calibeating from quadratic scoring rules to all proper scoring rules, defining proper-calibration and proper-calibeating. It shows calibration implies proper-calibration but calibeating does not imply proper-calibeating, and provides methods to achieve proper-calibeating and proper-multicalibeating, with an equivalence to universal no regret in decision-making.
The classic concept of "calibrated forecasts" and its more recent refinement, "calibeating," are defined with respect to the standard quadratic scoring rule. We extend these notions to the class of $\textit{proper}$ scoring rules (for which the best forecast is the true distribution) and define $\textit{proper-calibration}$ and $\textit{proper-calibeating}$ by requiring the errors to converge to zero uniformly over all bounded proper scoring rules. We first establish that calibration always implies proper-calibration, whereas calibeating need not imply proper-calibeating. Second, we show how to guarantee proper-calibeating and proper-multicalibeating. Finally, we demonstrate the equivalence between proper-calibration and universal no regret when best replying to forecasts in decision-making under uncertainty.