IR GLJan 18, 2017

First Study on Data Readiness Level

Hui Guan, Thanos Gentimis, Hamid Krim, James Keiser

arXiv:1702.02107v12.2

Originality Incremental advance

AI Analysis

This addresses the problem of assessing data quality for data scientists, but it is incremental as it builds on existing concepts of data readiness.

The authors introduced Data Readiness Level (DRL) to measure data richness for answering specific questions, defining it based on properties like Noisiness, Believability, Relevance, and Coherence, and validated two proposed metrics through a text-based experiment with Twitter data.

We introduce the idea of Data Readiness Level (DRL) to measure the relative richness of data to answer specific questions often encountered by data scientists. We first approach the problem in its full generality explaining its desired mathematical properties and applications and then we propose and study two DRL metrics. Specifically, we define DRL as a function of at least four properties of data: Noisiness, Believability, Relevance, and Coherence. The information-theoretic based metrics, Cosine Similarity and Document Disparity, are proposed as indicators of Relevance and Coherence for a piece of data. The proposed metrics are validated through a text-based experiment using Twitter data.

View on arXiv PDF

Similar