MLHCLGApr 9, 2018

Human-Guided Data Exploration

arXiv:1804.03194v1Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific problem for data analysts by providing incremental improvements to interactive data exploration tools, enabling more targeted investigation of data subsets and hypotheses.

The paper tackles the limitation of existing interactive data mining systems where users cannot steer exploration towards specific questions, by introducing the Human Guided Data Exploration framework that allows users to incorporate knowledge, focus on subsets, and compare hypotheses. The evaluation on real-world datasets demonstrates that these additions are important for enhancing the interactive iterative data mining process.

The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. In the recently proposed paradigm of iterative data mining the user controls the exploration by inputting knowledge in the form of patterns observed during the process. The system then shows the user views of the data that are maximally informative given the user's current knowledge. Although this scheme is good at showing surprising views of the data to the user, there is a clear shortcoming: the user cannot steer the process. In many real cases we want to focus on investigating specific questions concerning the data. This paper presents the Human Guided Data Exploration framework, generalising previous research. This framework allows the user to incorporate existing knowledge into the exploration process, focus on exploring a subset of the data, and compare different complex hypotheses concerning relations in the data. The framework utilises a computationally efficient constrained randomisation scheme. To showcase the framework, we developed a free open-source tool, using which the empirical evaluation on real-world datasets was carried out. Our evaluation shows that the ability to focus on particular subsets and being able to compare hypotheses are important additions to the interactive iterative data mining process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes