Yelipe UshaRani

IRMay 3, 2016

A Novel Approach for Imputation of Missing Attribute Values for Efficient Mining of Medical Datasets - Class Based Cluster Approach

Yelipe UshaRani, P. Sammulal

Missing attribute values are quite common in the datasets available in the literature. Missing values are also possible because all attributes values may not be recorded and hence unavailable due to several practical reasons. For all these one must fix missing attribute vales if the analysis has to be done. Imputation is the first step in analyzing medical datasets. Hence this has achieved significant contribution from several medical domain researchers. Several data mining researchers have proposed various methods and approaches to impute missing values. However very few of them concentrate on dimensionality reduction. In this paper, we discuss a novel imputation framework for missing values imputation. Our approach of filling missing values is rooted on class based clustering approach and essentially aims at medical records dimensionality reduction. We use these dimensionality records for carrying prediction and classification analysis. A case study is discussed which shows how imputation is performed using proposed method.

DBMar 10, 2016

An Innovative Imputation and Classification Approach for Accurate Disease Prediction

Yelipe UshaRani, P. Sammulal

Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been recorded to name some of the reasons. Data mining researchers have been proposing various approaches to find and impute missing values to increase classification accuracies so that disease may be predicted accurately. In this paper, we propose a novel imputation approach for imputation of missing values and performing classification after fixing missing values. The approach is based on clustering concept and aims at dimensionality reduction of the records. The case study discussed shows that missing values can be fixed and imputed efficiently by achieving dimensionality reduction. The importance of proposed approach for classification is visible in the case study which assigns single class label in contrary to multi-label assignment if dimensionality reduction is not performed.

Yelipe UshaRani

2 Papers