HCLGJul 16, 2025

d-DQIVAR: Data-centric Visual Analytics and Reasoning for Data Quality Improvement

arXiv:2507.11960v1h-index: 6
Originality Incremental advance
AI Analysis

This work addresses data quality improvement for machine learning practitioners, but it is incremental as it builds on existing visual analytics and DQ methods.

The authors tackled the problem of improving data quality for machine learning model performance by introducing d-DQIVAR, a visual analytics system that integrates data-driven and process-driven approaches, enabling users to apply expert knowledge through case studies and evaluations.

Approaches to enhancing data quality (DQ) are classified into two main categories: data- and process-driven. However, prior research has predominantly utilized batch data preprocessing within the data-driven framework, which often proves insufficient for optimizing machine learning (ML) model performance and frequently leads to distortions in data characteristics. Existing studies have primarily focused on data preprocessing rather than genuine data quality improvement (DQI). In this paper, we introduce d-DQIVAR, a novel visual analytics system designed to facilitate DQI strategies aimed at improving ML model performance. Our system integrates visual analytics techniques that leverage both data-driven and process-driven approaches. Data-driven techniques tackle DQ issues such as imputation, outlier detection, deletion, format standardization, removal of duplicate records, and feature selection. Process-driven strategies encompass evaluating DQ and DQI procedures by considering DQ dimensions and ML model performance and applying the Kolmogorov-Smirnov test. We illustrate how our system empowers users to harness expert and domain knowledge effectively within a practical workflow through case studies, evaluations, and user studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes