LGFeb 12, 2023

Review of Extreme Multilabel Classification

Arpan Dasgupta, Preeti Lamba, Ankita Kushwaha, Kiran Ravish, Siddhant Katyan, Shrutimoy Das, Pawan Kumar

arXiv:2302.05971v310.76 citationsh-index: 29

Originality Synthesis-oriented

AI Analysis

This is an incremental review paper summarizing existing methods for XMLC, a problem relevant to researchers and practitioners dealing with large-scale classification tasks.

The paper reviews extreme multi-label classification (XMLC), addressing the challenge of handling a large number of labels where traditional methods fail to scale, and discusses various approaches like embeddings and metrics to improve predictions for both head and tail labels.

Extreme multi-label classification or XMLC, is an active area of interest in machine learning. Compared to traditional multi-label classification, here the number of labels is extremely large, hence, the name extreme multi-label classification. Using classical one-versus-all classification does not scale in this case due to large number of labels; the same is true for any other classifier. Embedding labels and features into a lower-dimensional space is a common first step in many XMLC methods. Moreover, other issues include existence of head and tail labels, where tail labels are those that occur in a relatively small number of samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.

View on arXiv PDF

Similar