Steven Drucker

AI
5papers
148citations
Novelty36%
AI Score39

5 Papers

AISep 27, 2024
Data Analysis in the Era of Generative AI

Jeevana Priya Inala, Chenglong Wang, Steven Drucker et al. · microsoft-research

This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow by translating high-level user intentions into executable code, charts, and insights. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps. Finally, we discuss the research challenges that impede the development of these AI-based systems such as enhancing model capabilities, evaluating and benchmarking, and understanding end-user needs.

37.5AIApr 8
Bridging Natural Language and Interactive What-If Interfaces via LLM-Generated Declarative Specification

Sneha Gathani, Sirui Zeng, Diya Patel et al. · mit

What-if analysis (WIA) is an iterative, multi-step process where users explore and compare hypothetical scenarios by adjusting parameters, applying constraints, and scoping data through interactive interfaces. Current tools fall short of supporting effective interactive WIA: spreadsheet and BI tools require time-consuming and laborious setup, while LLM-based chatbot interfaces are semantically fragile, frequently misinterpret intent, and produce inconsistent results as conversations progress. To address these limitations, we present a two-stage workflow that translates natural language (NL) WIA questions into interactive visual interfaces via an intermediate representation, powered by the Praxa Specification Language (PSL): first, LLMs generate PSL specifications from NL questions capturing analytical intent and logic, enabling validation and repair of erroneous specifications; and second, the specifications are compiled into interactive visual interfaces with parameter controls and linked visualizations. We benchmark this workflow with 405 WIA questions spanning 11 WIA types, 5 datasets, and 3 state-of-the-art LLMs. The results show that across models, half of specifications (52.42%) are generated correctly without intervention. We perform an analysis of the failure cases and derive an error taxonomy spanning non-functional errors (specifications fail to compile) and functional errors (specifications compile but misrepresent intent). Based on the taxonomy, we apply targeted repairs on the failure cases using few-shot prompts and improve the success rate to 80.42%. Finally, we show how undetected functional errors propagate through compilation into plausible but misleading interfaces, demonstrating that the intermediate specification is critical for reliably bridging NL and interactive WIA interface in LLM-powered WIA systems.

HCAug 28, 2024
Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way

Chenglong Wang, Bongshin Lee, Steven Drucker et al.

Data analysts often need to iterate between data transformations and chart designs to create rich visualizations for exploratory data analysis. Although many AI-powered systems have been introduced to reduce the effort of visualization authoring, existing systems are not well suited for iterative authoring. They typically require analysts to provide, in a single turn, a text-only prompt that fully describe a complex visualization. We introduce Data Formulator 2 (DF2 for short), an AI-powered visualization system designed to overcome this limitation. DF2 blends graphical user interfaces and natural language inputs to enable users to convey their intent more effectively, while delegating data transformation to AI. Furthermore, to support efficient iteration, DF2 lets users navigate their iteration history and reuse previous designs, eliminating the need to start from scratch each time. A user study with eight participants demonstrated that DF2 allowed participants to develop their own iteration styles to complete challenging data exploration sessions.

LGJan 5, 2020Code
A System for Real-Time Interactive Analysis of Deep Learning Training

Shital Shah, Roland Fernandez, Steven Drucker

Performing diagnosis or exploratory analysis during the training of deep learning models is challenging but often necessary for making a sequence of decisions guided by the incremental observations. Currently available systems for this purpose are limited to monitoring only the logged data that must be specified before the training process starts. Each time a new information is desired, a cycle of stop-change-restart is required in the training process. These limitations make interactive exploration and diagnosis tasks difficult, imposing long tedious iterations during the model development. We present a new system that enables users to perform interactive queries on live processes generating real-time information that can be rendered in multiple formats on multiple surfaces in the form of several desired visualizations simultaneously. To achieve this, we model various exploratory inspection and diagnostic tasks for deep learning training processes as specifications for streams using a map-reduce paradigm with which many data scientists are already familiar. Our design achieves generality and extensibility by defining composable primitives which is a fundamentally different approach than is used by currently available systems. The open source implementation of our system is available as TensorWatch project at https://github.com/microsoft/tensorwatch.

HCAug 31, 2020
Data Visceralization: Enabling Deeper Understanding of Data Using Virtual Reality

Benjamin Lee, Dave Brown, Bongshin Lee et al.

A fundamental part of data visualization is transforming data to map abstract information onto visual attributes. While this abstraction is a powerful basis for data visualization, the connection between the representation and the original underlying data (i.e., what the quantities and measurements actually correspond with in reality) can be lost. On the other hand, virtual reality (VR) is being increasingly used to represent real and abstract models as natural experiences to users. In this work, we explore the potential of using VR to help restore the basic understanding of units and measures that are often abstracted away in data visualization in an approach we call data visceralization. By building VR prototypes as design probes, we identify key themes and factors for data visceralization. We do this first through a critical reflection by the authors, then by involving external participants. We find that data visceralization is an engaging way of understanding the qualitative aspects of physical measures and their real-life form, which complements analytical and quantitative understanding commonly gained from data visualization. However, data visceralization is most effective when there is a one-to-one mapping between data and representation, with transformations such as scaling affecting this understanding. We conclude with a discussion of future directions for data visceralization.