LG MLMay 9, 2023

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

arXiv:2305.05248v26.610 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a theoretical gap for researchers in multi-label learning, though it is incremental as it builds on existing AUC analysis.

The paper tackles the lack of theoretical understanding of Macro-AUC in multi-label learning by analyzing generalization bounds for various algorithms, identifying label-wise class imbalance as a critical factor, and showing that univariate loss-based methods are more sensitive to imbalance than pairwise or reweighted ones, with empirical results supporting the theory.

Macro-AUC is the arithmetic mean of the class-wise AUCs in multi-label learning and is commonly used in practice. However, its theoretical understanding is far lacking. Toward solving it, we characterize the generalization properties of various learning algorithms based on the corresponding surrogate losses w.r.t. Macro-AUC. We theoretically identify a critical factor of the dataset affecting the generalization bounds: \emph{the label-wise class imbalance}. Our results on the imbalance-aware error bounds show that the widely-used univariate loss-based algorithm is more sensitive to the label-wise class imbalance than the proposed pairwise and reweighted loss-based ones, which probably implies its worse performance. Moreover, empirical results on various datasets corroborate our theory findings. To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest.

View on arXiv PDF Code

Similar