AIApr 18, 2021

Multi-objective Feature Selection with Missing Data in Classification

Yu Xue, Yihang Tang, Xin Xu, Jiayu Liang, Ferrante Neri

arXiv:2104.08747v110.1119 citations

Originality Incremental advance

AI Analysis

This work addresses unreliable feature selection in classification for real-world applications with missing data, but it is incremental as it extends existing multi-objective approaches.

The authors tackled feature selection with missing data by adding reliability as a third objective, and their three-objective model with NSGA-III efficiently addressed the problem on six UCI data sets.

Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a+ bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the experiments, k-nearest neighbors (K-NN) is used as the classifier to evaluate the feature subsets. Experimental results show that the proposed three-objective model coupled with NSGA-III efficiently addresses the FS problem for the six data sets included in this study.

View on arXiv PDF

Similar