OH HC SEApr 6, 2021

Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications

Eunice Jun, Melissa Birchfield, Nicole de Moura, Jeffrey Heer, Rene Just

arXiv:2104.02712v17.321 citations

Originality Synthesis-oriented

AI Analysis

This work addresses challenges in data analysis for researchers and analysts, but it is incremental as it builds on existing understanding of hypothesis formalization.

The study investigated the process of translating research hypotheses into statistical models, finding that analysts often fixate on implementation and use familiar but sub-optimal approaches, while software tools provide inconsistent abstractions that limit model choices.

Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analysis of research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analysis to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation, and discuss implications for future tools.

View on arXiv PDF

Similar