Anticipating Performativity by Predicting from Predictions
This work addresses a key challenge in understanding and mitigating performative feedback loops in predictive modeling, which is crucial for deploying fair and effective models in social domains like education and finance.
The paper tackles the problem of estimating the causal effect of predictions on outcomes in performative settings, where predictions can influence the outcomes they aim to predict, and identifies three scenarios where this causal relationship can be identified from observational data. Empirically, it shows that supervised learning methods can find transferable functional relationships under these conditions, enabling conclusions about newly deployed models.
Predictions about people, such as their expected educational achievement or their credit risk, can be performative and shape the outcome that they aim to predict. Understanding the causal effect of these predictions on the eventual outcomes is crucial for foreseeing the implications of future predictive models and selecting which models to deploy. However, this causal estimation task poses unique challenges: model predictions are usually deterministic functions of input features and highly correlated with outcomes. This can make the causal effects of predictions on outcomes impossible to disentangle from the direct effect of the covariates. We study this problem through the lens of causal identifiability, and despite the hardness of this problem in full generality, we highlight three natural scenarios where the causal relationship between covariates, predictions and outcomes can be identified from observational data: randomization in predictions, overparameterization of the predictive model deployed during data collection, and discrete prediction outputs. Empirically we show that given our identifiability conditions hold, standard variants of supervised learning that predict from predictions by treating the prediction as an input feature can indeed find transferable functional relationships that allow for conclusions about newly deployed predictive models. These positive results fundamentally rely on model predictions being recorded during data collection, bringing forward the importance of rethinking standard data collection practices to enable progress towards a better understanding of social outcomes and performative feedback loops.