AIMLSep 12, 2019

Augmented Data Science: Towards Industrialization and Democratization of Data Science

arXiv:1909.05682v1
Originality Incremental advance
AI Analysis

This addresses the human bottleneck in data science for industrial and democratized applications, though it appears incremental as it builds on existing statistics and ML methods.

The paper tackles the problem of data scientists spending excessive manual effort on data preparation and exploration, introducing Augmented Data Science (ADS) as a data-driven, domain-agnostic solution that automates these steps and augments judgment with insights, as demonstrated in a case study.

Conversion of raw data into insights and knowledge requires substantial amounts of effort from data scientists. Despite breathtaking advances in Machine Learning (ML) and Artificial Intelligence (AI), data scientists still spend the majority of their effort in understanding and then preparing the raw data for ML/AI. The effort is often manual and ad hoc, and requires some level of domain knowledge. The complexity of the effort increases dramatically when data diversity, both in form and context, increases. In this paper, we introduce our solution, Augmented Data Science (ADS), towards addressing this "human bottleneck" in creating value from diverse datasets. ADS is a data-driven approach and relies on statistics and ML to extract insights from any data set in a domain-agnostic way to facilitate the data science process. Key features of ADS are the replacement of rudimentary data exploration and processing steps with automation and the augmentation of data scientist judgment with automatically-generated insights. We present building blocks of our end-to-end solution and provide a case study to exemplify its capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes