Kyle Heuton

LG
h-index7
3papers
1citation
Novelty40%
AI Score34

3 Papers

LGOct 21, 2025
Partial VOROS: A Cost-aware Performance Metric for Binary Classifiers with Precision and Capacity Constraints

Christopher Ratigan, Kyle Heuton, Carissa Wang et al.

The ROC curve is widely used to assess binary classification performance. Yet for some applications such as alert systems for hospitalized patient monitoring, conventional ROC analysis cannot capture crucial factors that impact deployment, such as enforcing a minimum precision constraint to avoid false alarm fatigue or imposing an upper bound on the number of predicted positives to represent the capacity of hospital staff. The usual area under the curve metric also does not reflect asymmetric costs for false positives and false negatives. In this paper we address all three of these issues. First, we show how the subset of classifiers that meet given precision and capacity constraints can be represented as a feasible region in ROC space. We establish the geometry of this feasible region. We then define the partial area of lesser classifiers, a performance metric that is monotonic with cost and only accounts for the feasible portion of ROC space. Averaging this area over a desired range of cost parameters results in the partial volume over the ROC surface, or partial VOROS. In experiments predicting mortality risk using vital sign history on the MIMIC-IV dataset, we show this cost-aware metric is better than alternatives for ranking classifiers in hospital alert applications.

ED-PHMar 19, 2025
Combining physics education and machine learning research to measure evidence of students' mechanistic sensemaking

Kaitlin Gili, Kyle Heuton, Astha Shah et al.

Advances in machine learning (ML) offer new possibilities for science education research. We report on early progress in the design of an ML-based tool to analyze students' mechanistic sensemaking, working from a coding scheme that is aligned with previous work in physics education research (PER) and amenable to recently developed ML classification strategies using language encoders. We describe pilot tests of the tool, in three versions with different language encoders, to analyze sensemaking evident in college students' written responses to brief conceptual questions. The results show, first, that the tool's measurements of sensemaking can achieve useful agreement with a human coder, and, second, that encoder design choices entail a tradeoff between accuracy and computational expense. We discuss the promise and limitations of this approach, providing examples as to how this measurement scheme may serve PER in the future. We conclude with reflections on the use of ML to support PER research, with cautious optimism for strategies of co-design between PER and ML.

LGMar 7, 2025
Decision-aware training of spatiotemporal forecasting models to select a top K subset of sites for intervention

Kyle Heuton, F. Samuel Muench, Shikhar Shrestha et al.

Optimal allocation of scarce resources is a common problem for decision makers faced with choosing a limited number of locations for intervention. Spatiotemporal prediction models could make such decisions data-driven. A recent performance metric called fraction of best possible reach (BPR) measures the impact of using a model's recommended size K subset of sites compared to the best possible top-K in hindsight. We tackle two open problems related to BPR. First, we explore how to rank all sites numerically given a probabilistic model that predicts event counts jointly across sites. Ranking via the per-site mean is suboptimal for BPR. Instead, we offer a better ranking for BPR backed by decision theory. Second, we explore how to train a probabilistic model's parameters to maximize BPR. Discrete selection of K sites implies all-zero parameter gradients which prevent standard gradient training. We overcome this barrier via advances in perturbed optimizers. We further suggest a training objective that combines likelihood with a decision-aware BPR constraint to deliver high-quality top-K rankings as well as good forecasts for all sites. We demonstrate our approach on two where-to-intervene applications: mitigating opioid-related fatal overdoses for public health and monitoring endangered wildlife.