72.9PLMay 2
FlowBook: Enforcing Reproducibility in Computational NotebooksStephen N. Freund, Emery D. Berger, Cormac Flanagan et al.
Computational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from clean top-to-bottom runs. Prior approaches either employ dependency analyses or enforce reactive dataflow models that face fundamental tradeoffs among expressiveness, precision, and performance. This paper exploits the insight that reproducibility can be enforced without precise dependency tracking: a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries. FlowBook detects stale cells whose recorded outputs may no longer reflect the current notebook state and prevents operations that would violate reproducibility. FlowBook incurs near-imperceptible latency overhead (median: 70 ms).
22.0PLApr 23
Causality and Semantic SeparationAnna Zhang, Qinglan Luo, London Bielicke et al.
The design of scientific experiments deserves its own variation of formal verification to catch cases where scientists made important mistakes, such as forgetting to take confounding variables into account. One of the most fundamental underpinnings of science is causality, or what it means for interventions in the world to cause other outcomes, as formalized by computer scientists like Judea Pearl. However, these ideas had not previously been made rigorous to the standards of the programming-languages community, where one expects a (syntactic) program analysis to be proved sound with respect to a natural semantics. In the domain of causality, as the relevant "program analysis," we focus on $d$-separation, a classic condition on graphs that can be used to decide when the design of an experiment controls for sufficiently many confounding variables, even though the reason that this condition works is often unintuitive. Our central result (mechanized in Rocq) is that $d$-separation exactly coincides with a novel semantic definition inspired by noninterference from the theory of security. This characterization provides a structural semantic foundation for $d$-separation and helps explain why the graph-theoretic condition is correct, independently of probabilistic assumptions. For each given automated test on the quality of an experiment design, our theorem justifies an associated method for falsifying the world-modeling hypothesis behind the experiment.
23.2HCApr 14
PLanet: Formalizing and Analyzing Assignment Procedures in the Design of ExperimentsLondon Bielicke, Anna Zhang, Shruti Tyagi et al.
Experimental designs reflect assumptions about variable relationships that determine what causal queries researchers can answer through the experiment. Accounting for and communicating these assumptions is essential for drawing valid, generalizable conclusions from scientific experiments. Unfortunately, existing experimental design tools elide these details, expecting researchers to reason about design decisions and assumptions on their own. To surface assumptions and enable design exploration, we introduce a grammar of composable operators for constructing experimental assignment procedures grounded in matrix algebra. The PLanet DSL implements this grammar and compiles PLanet programs into constraint satisfaction problems over matrices. Together, PLanet's composable grammar and matrix representation enable a static analysis to determine which causal queries are testable under different assumptions. In an expressivity evaluation, PLanet was the most expressive of existing DSLs. Critical reflections with the authors of these DSLs revealed that PLanet makes design choices explicit without requiring procedural specification. Think-aloud studies showed that PLanet facilitated design exploration and surfaced assumptions researchers may otherwise overlook.
HCOct 25, 2023
rTisane: Externalizing conceptual models for data analysis increases engagement with domain knowledge and improves statistical model qualityEunice Jun, Edward Misback, Jeffrey Heer et al.
Statistical models should accurately reflect analysts' domain knowledge about variables and their relationships. While recent tools let analysts express these assumptions and use them to produce a resulting statistical model, it remains unclear what analysts want to express and how externalization impacts statistical model quality. This paper addresses these gaps. We first conduct an exploratory study of analysts using a domain-specific language (DSL) to express conceptual models. We observe a preference for detailing how variables relate and a desire to allow, and then later resolve, ambiguity in their conceptual models. We leverage these findings to develop rTisane, a DSL for expressing conceptual models augmented with an interactive disambiguation process. In a controlled evaluation, we find that rTisane's DSL helps analysts engage more deeply with and accurately externalize their assumptions. rTisane also leads to statistical models that match analysts' assumptions, maintain analysis intent, and better fit the data.
HCDec 23, 2025
Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and OutputsCyrus Vachha, Yixiao Kang, Zach Dive et al.
Authoring 3D scenes is a central task for spatial computing applications. Competing visions for lowering existing barriers are (1) focus on immersive, direct manipulation of 3D content or (2) leverage AI techniques that capture real scenes (3D Radiance Fields such as, NeRFs, 3D Gaussian Splatting) and modify them at a higher level of abstraction, at the cost of high latency. We unify the complementary strengths of these approaches and investigate how to integrate generative AI advances into real-time, immersive 3D Radiance Field editing. We introduce Dreamcrafter, a VR-based 3D scene editing system that: (1) provides a modular architecture to integrate generative AI algorithms; (2) combines different levels of control for creating objects, including natural language and direct manipulation; and (3) introduces proxy representations that support interaction during high-latency operations. We contribute empirical findings on control preferences and discuss how generative AI interfaces beyond text input enhance creativity in scene editing and world building.
HCApr 18, 2025
Flowco: Rethinking Data Analysis in the Age of LLMsStephen N. Freund, Brooke Simon, Emery D. Berger et al.
Conducting data analysis typically involves authoring code to transform, visualize, analyze, and interpret data. Large language models (LLMs) are now capable of generating such code for simple, routine analyses. LLMs promise to democratize data science by enabling those with limited programming expertise to conduct data analyses, including in scientific research, business, and policymaking. However, analysts in many real-world settings must often exercise fine-grained control over specific analysis steps, verify intermediate results explicitly, and iteratively refine their analytical approaches. Such tasks present barriers to building robust and reproducible analyses using LLMs alone or even in conjunction with existing authoring tools (e.g., computational notebooks). This paper introduces Flowco, a new mixed-initiative system to address these challenges. Flowco leverages a visual dataflow programming model and integrates LLMs into every phase of the authoring process. A user study suggests that Flowco supports analysts, particularly those with less programming experience, in quickly authoring, debugging, and refining data analyses.
AIJan 7, 2022
Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data RelationshipsEunice Jun, Audrey Seo, Jeffrey Heer et al.
Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects models (GLMMs) help answer complex research questions, but omitting random effects impairs the generalizability of results. To address this need, we present Tisane, a mixed-initiative system for authoring generalized linear models with and without mixed-effects. Tisane introduces a study design specification language for expressing and asking questions about relationships between variables. Tisane contributes an interactive compilation process that represents relationships in a graph, infers candidate statistical models, and asks follow-up questions to disambiguate user queries to construct a valid model. In case studies with three researchers, we find that Tisane helps them focus on their goals and assumptions while avoiding past mistakes.
OHApr 6, 2021
Hypothesis Formalization: Empirical Findings, Software Limitations, and Design ImplicationsEunice Jun, Melissa Birchfield, Nicole de Moura et al.
Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analysis of research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analysis to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation, and discuss implications for future tools.
PLApr 10, 2019
Tea: A High-level Language and Runtime System for Automating Statistical AnalysisEunice Jun, Maureen Daum, Jared Roesch et al.
Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests.