MLLGOct 29, 2022

Reformulating van Rijsbergen's $F_β$ metric for weighted binary cross-entropy

arXiv:2210.16458v3
Originality Incremental advance
AI Analysis

It addresses the challenge of improving classification performance and interpretability for machine learning practitioners, though it is incremental as it builds on existing metrics and loss functions.

This paper tackles the problem of suboptimal model training due to the separation of performance metrics and loss functions by reformulating van Rijsbergen's F_β metric to integrate it with weighted binary cross-entropy, resulting in a 14% boost in F1 score on IMDB text data.

The separation of performance metrics from gradient based loss functions may not always give optimal results and may miss vital aggregate information. This paper investigates incorporating a performance metric alongside differentiable loss functions to inform training outcomes. The goal is to guide model performance and interpretation by assuming statistical distributions on this performance metric for dynamic weighting. The focus is on van Rijsbergens $F_β$ metric -- a popular choice for gauging classification performance. Through distributional assumptions on the $F_β$, an intermediary link can be established to the standard binary cross-entropy via dynamic penalty weights. First, the $F_β$ metric is reformulated to facilitate assuming statistical distributions with accompanying proofs for the cumulative density function. These probabilities are used within a knee curve algorithm to find an optimal $β$ or $β_{opt}$. This $β_{opt}$ is used as a weight or penalty in the proposed weighted binary cross-entropy. Experimentation on publicly available data along with benchmark analysis mostly yields better and interpretable results as compared to the baseline for both imbalanced and balanced classes. For example, for the IMDB text data with known labeling errors, a 14% boost in $F_1$ score is shown. The results also reveal commonalities between the penalty model families derived in this paper and the suitability of recall-centric or precision-centric parameters used in the optimization. The flexibility of this methodology can enhance interpretation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes