LGAIDBOct 15, 2024

ILAEDA: An Imitation Learning Based Approach for Automatic Exploratory Data Analysis

arXiv:2410.11276v11 citationsh-index: 12AIMLSystems
Originality Incremental advance
AI Analysis

This work addresses the problem of automating data analysis for users by offering a more effective alternative to reinforcement learning-based methods, though it is incremental in improving upon existing AutoEDA techniques.

The paper tackles the challenge of automating exploratory data analysis (AutoEDA) by proposing an imitation learning approach that learns from expert sessions, bypassing the need for manually defined reward functions. It outperforms the state-of-the-art method by up to 3x on benchmarks, demonstrating strong generalization across datasets.

Automating end-to-end Exploratory Data Analysis (AutoEDA) is a challenging open problem, often tackled through Reinforcement Learning (RL) by learning to predict a sequence of analysis operations (FILTER, GROUP, etc). Defining rewards for each operation is a challenging task and existing methods rely on various \emph{interestingness measures} to craft reward functions to capture the importance of each operation. In this work, we argue that not all of the essential features of what makes an operation important can be accurately captured mathematically using rewards. We propose an AutoEDA model trained through imitation learning from expert EDA sessions, bypassing the need for manually defined interestingness measures. Our method, based on generative adversarial imitation learning (GAIL), generalizes well across datasets, even with limited expert data. We also introduce a novel approach for generating synthetic EDA demonstrations for training. Our method outperforms the existing state-of-the-art end-to-end EDA approach on benchmarks by upto 3x, showing strong performance and generalization, while naturally capturing diverse interestingness measures in generated EDA sessions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes