Toon Vanderschueren

LG
h-index30
6papers
39citations
Novelty55%
AI Score28

6 Papers

MLJun 7, 2023
Accounting For Informative Sampling When Learning to Forecast Treatment Outcomes Over Time

Toon Vanderschueren, Alicia Curth, Wouter Verbeke et al.

Machine learning (ML) holds great potential for accurately forecasting treatment outcomes over time, which could ultimately enable the adoption of more individualized treatment strategies in many practical applications. However, a significant challenge that has been largely overlooked by the ML literature on this topic is the presence of informative sampling in observational data. When instances are observed irregularly over time, sampling times are typically not random, but rather informative -- depending on the instance's characteristics, past outcomes, and administered treatments. In this work, we formalize informative sampling as a covariate shift problem and show that it can prohibit accurate estimation of treatment outcomes if not properly accounted for. To overcome this challenge, we present a general framework for learning treatment outcomes in the presence of informative sampling using inverse intensity-weighting, and propose a novel method, TESAR-CDE, that instantiates this framework using Neural CDEs. Using a simulation environment based on a clinical use case, we demonstrate the effectiveness of our approach in learning under informative sampling.

GNJun 3, 2022
Prescriptive maintenance with causal machine learning

Toon Vanderschueren, Robert Boute, Tim Verdonck et al.

Machine maintenance is a challenging operational problem, where the goal is to plan sufficient preventive maintenance to avoid machine failures and overhauls. Maintenance is often imperfect in reality and does not make the asset as good as new. Although a variety of imperfect maintenance policies have been proposed in the literature, these rely on strong assumptions regarding the effect of maintenance on the machine's condition, assuming the effect is (1) deterministic or governed by a known probability distribution, and (2) machine-independent. This work proposes to relax both assumptions by learning the effect of maintenance conditional on a machine's characteristics from observational data on similar machines using existing methodologies for causal inference. By predicting the maintenance effect, we can estimate the number of overhauls and failures for different levels of maintenance and, consequently, optimize the preventive maintenance frequency to minimize the total estimated cost. We validate our proposed approach using real-life data on more than 4,000 maintenance contracts from an industrial partner. Empirical results show that our novel, causal approach accurately predicts the maintenance effect and results in individualized maintenance schedules that are more accurate and cost-effective than supervised or non-individualized approaches.

LGSep 7, 2023
Using representation balancing to learn conditional-average dose responses from clustered data

Christopher Bockel-Rickermann, Toon Vanderschueren, Jeroen Berrevoets et al.

Estimating a unit's responses to interventions with an associated dose, the "conditional average dose response" (CADR), is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such a response typically needs to be estimated from observational data, which introduces several challenges. That is why the machine learning (ML) community has proposed several tailored CADR estimators. Yet, the proposal of most of these methods requires strong assumptions on the distribution of data and the assignment of interventions, which go beyond the standard assumptions in causal inference. Whereas previous works have so far focused on smooth shifts in covariate distributions across doses, in this work, we will study estimating CADR from clustered data and where different doses are assigned to different segments of a population. On a novel benchmarking dataset, we show the impacts of clustered data on model performance and propose an estimator, CBRNet, that learns cluster-agnostic and hence dose-agnostic covariate representations through representation balancing for unbiased CADR inference. We run extensive experiments to illustrate the workings of our method and compare it with the state of the art in ML for CADR estimation.

LGMay 3, 2024
Metalearners for Ranking Treatment Effects

Toon Vanderschueren, Wouter Verbeke, Felipe Moraes et al.

Efficiently allocating treatments with a budget constraint constitutes an important challenge across various domains. In marketing, for example, the use of promotions to target potential customers and boost conversions is limited by the available budget. While much research focuses on estimating causal effects, there is relatively limited work on learning to allocate treatments while considering the operational context. Existing methods for uplift modeling or causal inference primarily estimate treatment effects, without considering how this relates to a profit maximizing allocation policy that respects budget constraints. The potential downside of using these methods is that the resulting predictive model is not aligned with the operational context. Therefore, prediction errors are propagated to the optimization of the budget allocation problem, subsequently leading to a suboptimal allocation policy. We propose an alternative approach based on learning to rank. Our proposed methodology directly learns an allocation policy by prioritizing instances in terms of their incremental profit. We propose an efficient sampling procedure for the optimization of the ranking model to scale our methodology to large-scale data sets. Theoretically, we show how learning to rank can maximize the area under a policy's incremental profit curve. Empirically, we validate our methodology and show its effectiveness in practice through a series of experiments on both synthetic and real-world data.

LGJun 12, 2024
Sources of Gain: Decomposing Performance in Conditional Average Dose Response Estimation

Christopher Bockel-Rickermann, Toon Vanderschueren, Tim Verdonck et al.

Estimating conditional average dose responses (CADR) is an important but challenging problem. Estimators must correctly model the potentially complex relationships between covariates, interventions, doses, and outcomes. In recent years, the machine learning community has shown great interest in developing tailored CADR estimators that target specific challenges. Their performance is typically evaluated against other methods on (semi-) synthetic benchmark datasets. Our paper analyses this practice and shows that using popular benchmark datasets without further analysis is insufficient to judge model performance. Established benchmarks entail multiple challenges, whose impacts must be disentangled. Therefore, we propose a novel decomposition scheme that allows the evaluation of the impact of five distinct components contributing to CADR estimator performance. We apply this scheme to eight popular CADR estimators on four widely-used benchmark datasets, running nearly 1,500 individual experiments. Our results reveal that most established benchmarks are challenging for reasons different from their creators' claims. Notably, confounding, the key challenge tackled by most estimators, is not an issue in any of the considered datasets. We discuss the major implications of our findings and present directions for future research.

LGFeb 9, 2022
A new perspective on classification: optimally allocating limited resources to uncertain tasks

Toon Vanderschueren, Bart Baesens, Tim Verdonck et al.

A central problem in business concerns the optimal allocation of limited resources to a set of available tasks, where the payoff of these tasks is inherently uncertain. In credit card fraud detection, for instance, a bank can only assign a small subset of transactions to their fraud investigations team. Typically, such problems are solved using a classification framework, where the focus is on predicting task outcomes given a set of characteristics. Resources are then allocated to the tasks that are predicted to be the most likely to succeed. However, we argue that using classification to address task uncertainty is inherently suboptimal as it does not take into account the available capacity. Therefore, we first frame the problem as a type of assignment problem. Then, we present a novel solution using learning to rank by directly optimizing the assignment's expected profit given limited, stochastic capacity. This is achieved by optimizing a specific instance of the net discounted cumulative gain, a commonly used class of metrics in learning to rank. Empirically, we demonstrate that our new method achieves higher expected profit and expected precision compared to a classification approach for a wide variety of application areas and data sets. This illustrates the benefit of an integrated approach and of explicitly considering the available resources when learning a predictive model.