Explainable Data Imputation using Constraints
This addresses the issue of bias and inference errors in data analysis due to missing values, with a focus on explainability, though it appears incremental in improving existing imputation methods.
The paper tackles the problem of missing or anomalous data values by introducing a new imputation algorithm that handles different data types and association constraints, and it also generates human-readable explanations for each imputation, showing competitive results compared to state-of-the-art techniques in experiments.
Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or singular value decomposition, require complete data. Many approaches impute numeric data and some do not consider dependency of attributes on other attributes, while some require human intervention and domain knowledge. We present a new algorithm for data imputation based on different data type values and their association constraints in data, which are not handled currently by any system. We show experimental results using different metrics comparing our algorithm with state of the art imputation techniques. Our algorithm not only imputes the missing values but also generates human readable explanations describing the significance of attributes used for every imputation.