HCSep 29, 2017

Foresight: Rapid Data Exploration Through Guideposts

arXiv:1709.10513v135 citations
Originality Incremental advance
AI Analysis

This addresses the problem of overwhelming data exploration for data scientists, though it appears incremental as it builds on existing visualization and recommender system concepts.

The paper tackles the challenge of exploratory data analysis (EDA) for large, complex datasets by introducing Foresight, a visualization recommender system that uses 'guideposts' (visualizations of pronounced statistical descriptors) to help users rapidly explore data, reducing the need for manual attribute and encoding selection.

Current tools for exploratory data analysis (EDA) require users to manually select data attributes, statistical computations and visual encodings. This can be daunting for large-scale, complex data. We introduce Foresight, a visualization recommender system that helps the user rapidly explore large high-dimensional datasets through "guideposts." A guidepost is a visualization corresponding to a pronounced instance of a statistical descriptor of the underlying data, such as a strong linear correlation between two attributes, high skewness or concentration about the mean of a single attribute, or a strong clustering of values. For each descriptor, Foresight initially presents visualizations of the "strongest" instances, based on an appropriate ranking metric. Given these initial guideposts, the user can then look at "nearby" guideposts by issuing "guidepost queries" containing constraints on metric type, metric strength, data attributes, and data values. Thus, the user can directly explore the network of guideposts, rather than the overwhelming space of data attributes and visual encodings. Foresight also provides for each descriptor a global visualization of ranking-metric values to both help orient the user and ensure a thorough exploration process. Foresight facilitates interactive exploration of large datasets using fast, approximate sketching to compute ranking metrics. We also contribute insights on EDA practices of data scientists, summarizing results from an interview study we conducted to inform the design of Foresight.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes