HCCYLGJul 3, 2025

Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks

arXiv:2507.02819v33 citationsh-index: 22Proc. ACM Hum. Comput. Interact.
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of defining target variables in data science for domains like education and healthcare, but it is incremental as it builds on existing qualitative research.

The study investigated how data scientists translate fuzzy concepts into concrete target variables for predictive modeling, finding they use a bricolage process to meet criteria like validity and predictability through adaptive strategies.

Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the "authenticity" of student writing or the "healthcare need" of a patient. Yet the process by which data scientists translate fuzzy concepts into a concrete, proxy target variable remains poorly understood. We interview fifteen data scientists in education (N=8) and healthcare (N=7) to understand how they construct target variables for predictive modeling tasks. Our findings suggest that data scientists construct target variables through a bricolage process, in which they use creative and pragmatic approaches to make do with the limited data at hand. Data scientists attempt to satisfy five major criteria for a target variable through bricolage: validity, simplicity, predictability, portability, and resource requirements. To achieve this, data scientists adaptively apply problem (re)formulation strategies, such as swapping out one candidate target variable for another when the first fails to meet certain criteria (e.g., predictability), or composing multiple outcomes into a single target variable to capture a more holistic set of modeling objectives. Based on our findings, we present opportunities for future HCI, CSCW, and ML research to better support the art and science of target variable construction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes