AIHCPLCOOTJan 7, 2022

Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships

arXiv:2201.02705v128 citations
AI Analysis

This addresses the problem of improving statistical modeling accuracy for data analysts, though it appears incremental as it builds on existing GLMM frameworks with new tool support.

The authors tackled the problem of data analysts lacking tool support for integrating domain assumptions, data collection, and modeling choices in statistical modeling, which can lead to mistakes compromising scientific validity. They presented Tisane, a mixed-initiative system for authoring generalized linear models, and found in case studies with three researchers that it helps them focus on goals and assumptions while avoiding past mistakes.

Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects models (GLMMs) help answer complex research questions, but omitting random effects impairs the generalizability of results. To address this need, we present Tisane, a mixed-initiative system for authoring generalized linear models with and without mixed-effects. Tisane introduces a study design specification language for expressing and asking questions about relationships between variables. Tisane contributes an interactive compilation process that represents relationships in a graph, infers candidate statistical models, and asks follow-up questions to disambiguate user queries to construct a valid model. In case studies with three researchers, we find that Tisane helps them focus on their goals and assumptions while avoiding past mistakes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes