LG IRJul 26, 2022

On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Erik Schultheis, Marek Wydmuch, Rohit Babbar, Krzysztof Dembczyński

arXiv:2207.13186v116.533 citationsh-index: 15

Originality Synthesis-oriented

AI Analysis

This work addresses limitations in a standard method for extreme multi-label classification, which is important for researchers and practitioners in large-scale classification tasks, but it is incremental as it critiques and proposes alternatives rather than introducing a new paradigm.

The paper critically revises the propensity model for handling missing and long-tail labels in extreme multi-label classification, showing its application is debatable and presenting alternative recipes inspired by search engines and recommender systems.

The propensity model introduced by Jain et al. 2016 has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender systems, that we believe constitute promising alternatives to be followed in XMLC.

View on arXiv PDF

Similar