DB AI CY LGMay 5, 2017

Data Readiness Levels

arXiv:1705.02245v110.845 citations

Originality Synthesis-oriented

AI Analysis

This addresses project management issues for data scientists and collaborators by providing a framework to mitigate overruns, but it is incremental as it builds on existing concepts of readiness levels.

The paper tackles the problem of data preparation challenges in machine learning projects, such as poor collection practices and missing values, by proposing data readiness levels as a common language to assess data set preparedness and improve project management.

Application of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing values, inconvenient storage mechanisms, intellectual property, security and privacy. All these aspects obstruct the sharing and interconnection of data, and the eventual interpretation of data through machine learning or other approaches. In project reporting, a major challenge is in encapsulating these problems and enabling goals to be built around the processing of data. Project overruns can occur due to failure to account for the amount of time required to curate and collate. But to understand these failures we need to have a common language for assessing the readiness of a particular data set. This position paper proposes the use of data readiness levels: it gives a rough outline of three stages of data preparedness and speculates on how formalisation of these levels into a common language for data readiness could facilitate project management.

View on arXiv PDF

Similar