Peter Mills

ML
5papers
67citations
Novelty33%
AI Score19

5 Papers

MLSep 16, 2018
Solving for multi-class: a survey and synthesis

Peter Mills

Many of the best statistical classification algorithms are binary classifiers that can only distinguish between one of two classes. The number of possible ways of generalizing binary classification to multi-class increases exponentially with the number of classes. There is some indication that the best method will depend on the dataset. Hence, we are particularly interested in data-driven solution design, whether based on prior considerations or on empirical examination of the data. Here we demonstrate how a recursive control language can be used to describe a multitude of different partitioning strategies in multi-class classification, including those in most common use. We use it both to manually construct new partitioning configurations as well as to examine those that have been automatically designed. Eight different strategies were tested on eight different datasets using a support vector machine (SVM) as the base binary classifier. Numerical results suggest that a one-size-fits-all solution consisting of one-versus-one is appropriate for most datasets. Three datasets showed better accuracy using different methods. The best solution for the most improved dataset exploited a property of the data to produce an uncertainty coefficient 36\% higher (0.016 absolute gain) than one-vs.-one. For the same dataset, an adaptive solution that empirically examined the data was also more accurate than one-vs.-one while being faster.

MLJan 27, 2018
Solving for multi-class using orthogonal coding matrices

Peter Mills

A common method of generalizing binary to multi-class classification is the error correcting code (ECC). ECCs may be optimized in a number of ways, for instance by making them orthogonal. Here we test two types of orthogonal ECCs on seven different datasets using three types of binary classifier and compare them with three other multi-class methods: 1 vs. 1, one-versus-the-rest and random ECCs. The first type of orthogonal ECC, in which the codes contain no zeros, admits a fast and simple method of solving for the probabilities. Orthogonal ECCs are always more accurate than random ECCs as predicted by recent literature. Improvments in uncertainty coefficient (U.C.) range between 0.4--17.5% (0.004--0.139, absolute), while improvements in Brier score between 0.7--10.7%. Unfortunately, orthogonal ECCs are rarely more accurate than 1 vs. 1. Disparities are worst when the methods are paired with logistic regression, with orthogonal ECCs never beating 1 vs. 1. When the methods are paired with SVM, the losses are less significant, peaking at 1.5%, relative, 0.011 absolute in uncertainty coefficient and 6.5% in Brier scores. Orthogonal ECCs are always the fastest of the five multi-class methods when paired with linear classifiers. When paired with a piecewise linear classifier, whose classification speed does not depend on the number of training samples, classifications using orthogonal ECCs were always more accurate than the other methods and also faster than 1 vs. 1. Losses against 1 vs. 1 here were higher, peaking at 1.9% (0.017, absolute), in U.C. and 39% in Brier score. Gains in speed ranged between 1.1% and over 100%. Whether the speed increase is worth the penalty in accuracy will depend on the application.

MLAug 20, 2017
Accelerating Kernel Classifiers Through Borders Mapping

Peter Mills

Support vector machines (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernel-based classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 12 of them by up to two orders-of-magnitude. Of these, two were less accurate than a simple, linear classifier. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.

MLApr 15, 2014
Multi-borders classification

Peter Mills

The number of possible methods of generalizing binary classification to multi-class classification increases exponentially with the number of class labels. Often, the best method of doing so will be highly problem dependent. Here we present classification software in which the partitioning of multi-class classification problems into binary classification problems is specified using a recursive control language.

AO-PHFeb 10, 2012
Efficient statistical classification of satellite measurements

Peter Mills

Supervised statistical classification is a vital tool for satellite image processing. It is useful not only when a discrete result, such as feature extraction or surface type, is required, but also for continuum retrievals by dividing the quantity of interest into discrete ranges. Because of the high resolution of modern satellite instruments and because of the requirement for real-time processing, any algorithm has to be fast to be useful. Here we describe an algorithm based on kernel estimation called Adaptive Gaussian Filtering that incorporates several innovations to produce superior efficiency as compared to three other popular methods: k-nearest-neighbour (KNN), Learning Vector Quantization (LVQ) and Support Vector Machines (SVM). This efficiency is gained with no compromises: accuracy is maintained, while estimates of the conditional probabilities are returned. These are useful not only to gauge the accuracy of an estimate in the absence of its true value, but also to re-calibrate a retrieved image and as a proxy for a discretized continuum variable. The algorithm is demonstrated and compared with the other three on a pair of synthetic test classes and to map the waterways of the Netherlands. Software may be found at: http://libagf.sourceforge.net.