DPERC: Direct Parameter Estimation for Mixed Data
This addresses a common data preprocessing challenge in statistical and machine learning applications, though it appears to be an incremental improvement over existing direct parameter estimation methods.
The paper tackles the problem of estimating covariance matrices from mixed data with missing values in continuous features, proposing DPERC which leverages categorical features to enhance estimation. Results show competitive performance compared to contemporary techniques and demonstrate its utility for correlation heatmap visualization.
The covariance matrix is a foundation in numerous statistical and machine-learning applications such as Principle Component Analysis, Correlation Heatmap, etc. However, missing values within datasets present a formidable obstacle to accurately estimating this matrix. While imputation methods offer one avenue for addressing this challenge, they often entail a trade-off between computational efficiency and estimation accuracy. Consequently, attention has shifted towards direct parameter estimation, given its precision and reduced computational burden. In this paper, we propose Direct Parameter Estimation for Randomly Missing Data with Categorical Features (DPERC), an efficient approach for direct parameter estimation tailored to mixed data that contains missing values within continuous features. Our method is motivated by leveraging information from categorical features, which can significantly enhance covariance matrix estimation for continuous features. Our approach effectively harnesses the information embedded within mixed data structures. Through comprehensive evaluations of diverse datasets, we demonstrate the competitive performance of DPERC compared to various contemporary techniques. In addition, we also show by experiments that DPERC is a valuable tool for visualizing the correlation heatmap.