DBLGJul 18, 2019

A Survey of Data Quality Measurement and Monitoring Tools

arXiv:1907.08138v1188 citations
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview of state-of-the-art data quality tools for practitioners and researchers, but it is incremental as it synthesizes existing tools without introducing new methods.

This survey investigates the gap between research on data quality measurement and practical implementations by evaluating 13 out of 667 identified software tools for their functional scope in data profiling, metrics, and monitoring, revealing potential enhancements and highlighting concepts like generally applicable metrics that are under-implemented.

High-quality data is key to interpretable and trustworthy data analytics and the basis for meaningful data-driven decisions. In practical scenarios, data quality is typically associated with data preprocessing, profiling, and cleansing for subsequent tasks like data integration or data analytics. However, from a scientific perspective, a lot of research has been published about the measurement (i.e., the detection) of data quality issues and different generally applicable data quality dimensions and metrics have been discussed. In this work, we close the gap between research into data quality measurement and practical implementations by investigating the functional scope of current data quality tools. With a systematic search, we identified 667 software tools dedicated to "data quality", from which we evaluated 13 tools with respect to three functionality areas: (1) data profiling, (2) data quality measurement in terms of metrics, and (3) continuous data quality monitoring. We selected the evaluated tools with regard to pre-defined exclusion criteria to ensure that they are domain-independent, provide the investigated functions, and are evaluable freely or as trial. This survey aims at a comprehensive overview on state-of-the-art data quality tools and reveals potential for their functional enhancement. Additionally, the results allow a critical discussion on concepts, which are widely accepted in research, but hardly implemented in any tool observed, for example, generally applicable data quality metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes