LGMLMay 10, 2023

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

arXiv:2305.06044v29 citations
Originality Synthesis-oriented
AI Analysis

This provides practical guidance for researchers and practitioners in data analysis, but it is incremental as it compares existing methods for a specific visualization task.

The paper tackled the problem of visualizing correlation matrices with missing data, comparing imputation and direct parameter estimation methods, and found that imputation can lead to misleading inferences, recommending DPER based on experimental results.

Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the effects of various missing data methods on the correlation plot, focusing on two common missing patterns: random and monotone. We aim to provide practical strategies and recommendations for researchers and practitioners in creating and analyzing the correlation plot. Our experimental results suggest that while imputation is commonly used for missing data, using imputed data for plotting the correlation matrix may lead to a significantly misleading inference of the relation between the features. We recommend using DPER, a direct parameter estimation approach, for plotting the correlation matrix based on its performance in the experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes