ME AP CO MLJul 13, 2013

Fractionally-Supervised Classification

arXiv:1307.3598v526 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of sub-optimal supervision choices in classification for data scientists, though it is incremental as it builds on existing classification methods.

The paper tackles the problem of choosing the optimal level of supervision in classification by introducing a fractionally-supervised approach that allows any level from unsupervised to supervised, using a weighted likelihood to control the role of labeled and unlabeled data, and demonstrates its application with Gaussian mixture models on simulated and real data.

Traditionally, there are three species of classification: unsupervised, supervised, and semi-supervised. Supervised and semi-supervised classification differ by whether or not weight is given to unlabelled observations in the classification procedure. In unsupervised classification, or clustering, all observations are unlabeled and hence full weight is given to unlabelled observations. When some observations are unlabelled, it can be very difficult to \textit{a~priori} choose the optimal level of supervision, and the consequences of a sub-optimal choice can be non-trivial. A flexible fractionally-supervised approach to classification is introduced, where any level of supervision --- ranging from unsupervised to supervised --- can be attained. Our approach uses a weighted likelihood, wherein weights control the relative role that labelled and unlabelled data have in building a classifier. A comparison between our approach and the traditional species is presented using simulated and real data. Gaussian mixture models are used as a vehicle to illustrate our fractionally-supervised classification approach; however, it is broadly applicable and variations on the postulated model can be easily made.

View on arXiv PDF

Similar