LG MLApr 1, 2020

Extreme Multi-label Classification from Aggregated Labels

Yanyao Shen, Hsiang-fu Yu, Sujay Sanghavi, Inderjit Dhillon

arXiv:2004.00198v18.510 citations

Originality Incremental advance

AI Analysis

This addresses a practical limitation in XMC for applications where aggregated labels are common, such as in large-scale tagging or recommendation systems, though it is incremental as it builds on existing XMC methods.

The paper tackles the problem of extreme multi-label classification (XMC) when labels are only available for groups of samples, not individual ones, by developing a scalable algorithm to impute individual labels from group labels, which can be paired with existing XMC methods, and experiments show advantages over existing approaches.

Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.

View on arXiv PDF

Similar