Machine learning with incomplete datasets using multi-objective optimization models
This work addresses the problem of handling missing data during model learning, which is a common challenge in various domains, including medical informatics.
This paper proposes an online multi-objective optimization approach to handle missing values while a classification model is being learned. It uses an evolutionary algorithm (NSGA II) to find optimal Pareto solutions for imputation and model selection, and investigates three different formulations for the imputation objective function.
Machine learning techniques have been developed to learn from complete data. When missing values exist in a dataset, the incomplete data should be preprocessed separately by removing data points with missing values or imputation. In this paper, we propose an online approach to handle missing values while a classification model is learnt. To reach this goal, we develop a multi-objective optimization model with two objective functions for imputation and model selection. We also propose three formulations for imputation objective function. We use an evolutionary algorithm based on NSGA II to find the optimal solutions as the Pareto solutions. We investigate the reliability and robustness of the proposed model using experiments by defining several scenarios in dealing with missing values and classification. We also describe how the proposed model can contribute to medical informatics. We compare the performance of three different formulations via experimental results. The proposed model results get validated by comparing with a comparable literature.