A Statistical Approach to Set Classification by Feature Selection with Applications to Classification of Histopathology Images
This work addresses set classification problems, such as disease diagnostics from multiple cell images, with an incremental improvement over existing methods.
The paper tackles set classification, where classification is based on sets of observations rather than individual ones, by proposing a statistical framework using feature selection and extraction methods like principal component analysis and multidimensional scaling. The method achieves better classification results than competing approaches in simulated data and demonstrates benefits in analyzing histopathology images for liver cancer diagnosis.
Set classification problems arise when classification tasks are based on sets of observations as opposed to individual observations. In set classification, a classification rule is trained with $N$ sets of observations, where each set is labeled with class information, and the prediction of a class label is performed also with a set of observations. Data sets for set classification appear, for example, in diagnostics of disease based on multiple cell nucleus images from a single tissue. Relevant statistical models for set classification are introduced, which motivate a set classification framework based on context-free feature extraction. By understanding a set of observations as an empirical distribution, we employ a data-driven method to choose those features which contain information on location and major variation. In particular, the method of principal component analysis is used to extract the features of major variation. Multidimensional scaling is used to represent features as vector-valued points on which conventional classifiers can be applied. The proposed set classification approaches achieve better classification results than competing methods in a number of simulated data examples. The benefits of our method are demonstrated in an analysis of histopathology images of cell nuclei related to liver cancer.