LG MLJun 12, 2018

A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data

arXiv:1806.06915v11.52 citations

Originality Synthesis-oriented

AI Analysis

It addresses the challenge of handling 'unexpected' outliers in spectroscopy data for hazardous material detection, which is an incremental improvement in a domain-specific context.

This dissertation tackled the problem of separating hazardous chlorinated solvents from other materials using Raman spectra by applying one-sided classification algorithms, finding that these classifiers are more robust than conventional multi-class classifiers when test data come from a different distribution than training samples.

This dissertation investigates the use of one-sided classification algorithms in the application of separating hazardous chlorinated solvents from other materials, based on their Raman spectra. The experimentation is carried out using a new one-sided classification toolkit that was designed and developed from the ground up. In the one-sided classification paradigm, the objective is to separate elements of the target class from all outliers. These one-sided classifiers are generally chosen, in practice, when there is a deficiency of some sort in the training examples. Sometimes outlier examples can be rare, expensive to label, or even entirely absent. However, this author would like to note that they can be equally applicable when outlier examples are plentiful but nonetheless not statistically representative of the complete outlier concept. It is this scenario that is explicitly dealt with in this research work. In these circumstances, one-sided classifiers have been found to be more robust that conventional multi-class classifiers. The term "unexpected" outliers is introduced to represent outlier examples, encountered in the test set, that have been taken from a different distribution to the training set examples. These are examples that are a result of an inadequate representation of all possible outliers in the training set. It can often be impossible to fully characterise outlier examples given the fact that they can represent the immeasurable quantity of "everything else" that is not a target. The findings from this research have shown the potential drawbacks of using conventional multi-class classification algorithms when the test data come from a completely different distribution to that of the training samples.

View on arXiv PDF

Similar