DBIROct 29, 2015

AIDE: An Automated Sample-based Approach for Interactive Data Exploration

arXiv:1510.08897v12 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of deriving insights from big data in domains like science and healthcare by automating exploration, though it is incremental as it builds on existing classification and data management techniques.

The paper tackles the problem of reducing human effort in exploring complex datasets by introducing AIDE, an automated interactive data exploration framework that uses classification and optimization to learn user interests from feedback on samples. It achieves highly accurate query predictions for common and complex queries with interactive performance, limiting wait times to less than a few seconds per iteration.

In this paper, we argue that database systems be augmented with an automated data exploration service that methodically steers users through the data in a meaningful way. Such an automated system is crucial for deriving insights from complex datasets found in many big data applications such as scientific and healthcare applications as well as for reducing the human effort of data exploration. Towards this end, we present AIDE, an Automatic Interactive Data Exploration framework that assists users in discovering new interesting data patterns and eliminate expensive ad-hoc exploratory queries. AIDE relies on a seamless integration of classification algorithms and data management optimization techniques that collectively strive to accurately learn the user interests based on his relevance feedback on strategically collected samples. We present a number of exploration techniques as well as optimizations that minimize the number of samples presented to the user while offering interactive performance. AIDE can deliver highly accurate query predictions for very common conjunctive queries with small user effort while, given a reasonable number of samples, it can predict with high accuracy complex disjunctive queries. It provides interactive performance as it limits the user wait time per iteration of exploration to less than a few seconds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes