Population-Guided Large Margin Classifier for High-Dimension Low -Sample-Size Problems
This addresses classification problems in domains with limited data but many features, such as medical imaging and genomics, though it appears incremental as it builds on existing large margin classifiers.
The paper tackles the challenge of high-dimensional low-sample-size (HDLSS) data in fields like gene expression and computer vision by proposing a novel linear binary classifier called PGLMC, which outperforms state-of-the-art methods on most benchmark datasets or achieves comparable results.
Various applications in different fields, such as gene expression analysis or computer vision, suffer from data sets with high-dimensional low-sample-size (HDLSS), which has posed significant challenges for standard statistical and modern machine learning methods. In this paper, we propose a novel linear binary classifier, denoted by population-guided large margin classifier (PGLMC), which is applicable to any sorts of data, including HDLSS. PGLMC is conceived with a projecting direction w given by the comprehensive consideration of local structural information of the hyperplane and the statistics of the training samples. Our proposed model has several advantages compared to those widely used approaches. First, it is not sensitive to the intercept term b. Second, it operates well with imbalanced data. Third, it is relatively simple to be implemented based on Quadratic Programming. Fourth, it is robust to the model specification for various real applications. The theoretical properties of PGLMC are proven. We conduct a series of evaluations on two simulated and six real-world benchmark data sets, including DNA classification, digit recognition, medical image analysis, and face recognition. PGLMC outperforms the state-of-the-art classification methods in most cases, or at least obtains comparable results.