ML LGSep 16, 2017

Subset Labeled LDA for Large-Scale Multi-Label Classification

Yannis Papanikolaou, Grigorios Tsoumakas

arXiv:1709.05480v14.85 citations

Originality Incremental advance

AI Analysis

This addresses scalability for multi-label classification tasks with large label sets, though it is incremental as a variant of an existing method.

The paper tackles scalability issues in Labeled Latent Dirichlet Allocation (LLDA) for multi-label classification by introducing Subset LLDA, which effectively handles up to hundreds of thousands of labels and improves over LLDA state-of-the-art, showing competitive results in experiments on eight datasets.

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-of-the-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands of labels but also improves over the LLDA state-of-the-art. We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior--LDA, Dep--LDA), as well as the state of the art in extreme multi-label classification. The results show a steady advantage of our method over the other LLDA algorithms and competitive results compared to the extreme multi-label classification algorithms.

View on arXiv PDF

Similar