COLGMLMar 27, 2019

The Landscape of R Packages for Automated Exploratory Data Analysis

arXiv:1904.02101v31 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for faster and easier data insights for data analysts dealing with large, noisy datasets, but it is incremental as it reviews existing tools rather than proposing new methods.

The paper systematically reviews twelve popular R packages for Automated Exploratory Data Analysis (autoEDA) to identify which analysis tasks can be effectively automated and suggest future development directions.

The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of twelve popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes