ML LGFeb 15, 2017

Nearest Labelset Using Double Distances for Multi-label Classification

Hyukjun Gweon, Matthias Schonlau, Stefan Steiner

arXiv:1702.04684v11.02 citationsh-index: 46

Originality Incremental advance

AI Analysis

This addresses the problem of exploiting label correlations in multi-label classification for researchers, though it is incremental as it builds on existing nearest labelset ideas.

The paper tackles multi-label classification by proposing NLDD, which predicts labelsets by minimizing a weighted sum of feature and label space distances, with weights estimated via binomial regression. Experiments show it outperforms other methods on average in Hamming loss, 0/1 loss, and multi-label accuracy, ranking second in F-measure.

Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this paper we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Experiments on benchmark multi-label data sets show that the proposed method on average outperforms other well-known approaches in terms of Hamming loss, 0/1 loss, and multi-label accuracy and ranks second after ECC on the F-measure.

View on arXiv PDF

Similar