ML LGJun 24, 2021

Label Disentanglement in Partition-based Extreme Multilabel Classification

Xuanqing Liu, Wei-Cheng Chang, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S. Dhillon

arXiv:2106.12751v19.411 citations

Originality Incremental advance

AI Analysis

This addresses scalability issues in extreme multi-label classification for applications with large output spaces, such as text tagging or recommendation systems, by improving label assignment, though it is incremental over existing partition-based methods.

The paper tackled the problem of multi-modal labels in partition-based extreme multi-label classification by proposing a method to disentangle labels through non-exclusive clustering, achieving state-of-the-art results on four benchmarks.

Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label "Apple" can be the fruit or the brand name, which leads to the following research question: can we disentangle these multi-modal labels with non-exclusive clustering tailored for downstream XMC tasks? In this paper, we show that the label assignment problem in partition-based XMC can be formulated as an optimization problem, with the objective of maximizing precision rates. This leads to an efficient algorithm to form flexible and overlapped label clusters, and a method that can alternatively optimizes the cluster assignments and the model parameters for partition-based XMC. Experimental results on synthetic and real datasets show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.

View on arXiv PDF

Similar