Optimizing F-measure: A Tale of Two Approaches
This work addresses the problem of selecting optimal F-measure optimization methods for practitioners dealing with imbalanced data, but it is incremental as it compares existing approaches rather than introducing new ones.
The paper investigates two approaches for optimizing F-measures in imbalanced data tasks: empirical utility maximization (EUM) and decision-theoretic methods, finding that they are asymptotically equivalent with accurate models, but EUM is more robust to model misspecification while decision-theoretic is better for rare classes and domain adaptation.
F-measures are popular performance metrics, particularly for tasks with imbalanced data sets. Algorithms for learning to maximize F-measures follow two approaches: the empirical utility maximization (EUM) approach learns a classifier having optimal performance on training data, while the decision-theoretic approach learns a probabilistic model and then predicts labels with maximum expected F-measure. In this paper, we investigate the theoretical justifications and connections for these two approaches, and we study the conditions under which one approach is preferable to the other using synthetic and real datasets. Given accurate models, our results suggest that the two approaches are asymptotically equivalent given large training and test sets. Nevertheless, empirically, the EUM approach appears to be more robust against model misspecification, and given a good model, the decision-theoretic approach appears to be better for handling rare classes and a common domain adaptation scenario.