STLGAPDec 10, 2022

Examining marginal properness in the external validation of survival models with squared and logarithmic losses

arXiv:2212.05260v32 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the need for reliable model evaluation in survival analysis, particularly for automated procedures like AutoML, though it is incremental as it validates existing methods rather than introducing new ones.

The paper examined the theoretical properness of two common scoring rules for survival analysis, the Integrated Survival Brier Score (ISBS) and Right-Censored Log-Likelihood (RCLL), finding them theoretically improper under a marginal definition but showing in simulations that they behave as proper rules in practice with only minor violations at small sample sizes.

Scoring rules promote rational and honest decision-making, which is important for model evaluation and becoming increasingly important for automated procedures such as `AutoML'. In this paper we survey common squared and logarithmic scoring rules for survival analysis, with a focus on their theoretical and empirical properness. We introduce a marginal definition of properness and show that both the Integrated Survival Brier Score (ISBS) and the Right-Censored Log-Likelihood (RCLL) are theoretically improper under this definition. We also investigate a new class of losses that may inform future survival scoring rules. Simulation experiments reveal that both the ISBS and RCLL behave as proper scoring rules in practice. The RCLL showed no violations across all settings, while ISBS exhibited only minor, negligible violations at extremely small sample sizes, suggesting one can trust results from historical experiments. As such we advocate for both the RCLL and ISBS in external validation of models, including in automated procedures. However, we note practical challenges in estimating these losses including estimation of censoring distributions and densities; as such further research is required to advance development of robust and honest evaluation in survival analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes