AIApr 18, 2021

Multi-objective Feature Selection with Missing Data in Classification

arXiv:2104.08747v1119 citations
Originality Incremental advance
AI Analysis

This work addresses unreliable feature selection in classification for real-world applications with missing data, but it is incremental as it extends existing multi-objective approaches.

The authors tackled feature selection with missing data by adding reliability as a third objective, and their three-objective model with NSGA-III efficiently addressed the problem on six UCI data sets.

Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a+ bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the experiments, k-nearest neighbors (K-NN) is used as the classifier to evaluate the feature subsets. Experimental results show that the proposed three-objective model coupled with NSGA-III efficiently addresses the FS problem for the six data sets included in this study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes