Modeling and Controlling Deployment Reliability under Temporal Distribution Shift
This work addresses the challenge of deployment reliability for high-stakes tabular applications like credit risk, offering a novel framework to balance stability and cost, though it is incremental in building on existing mitigation strategies.
The paper tackled the problem of maintaining machine learning model reliability under temporal distribution shift by proposing a deployment-centric framework that models reliability as a dynamic state and formulates adaptation as a multi-objective control problem. Experiments on a large-scale credit-risk dataset showed that selective, drift-triggered interventions achieved smoother reliability trajectories and reduced operational costs compared to continuous retraining.
Machine learning models deployed in non-stationary environments are exposed to temporal distribution shift, which can erode predictive reliability over time. While common mitigation strategies such as periodic retraining and recalibration aim to preserve performance, they typically focus on average metrics evaluated at isolated time points and do not explicitly model how reliability evolves during deployment. We propose a deployment-centric framework that treats reliability as a dynamic state composed of discrimination and calibration. The trajectory of this state across sequential evaluation windows induces a measurable notion of volatility, allowing deployment adaptation to be formulated as a multi-objective control problem that balances reliability stability against cumulative intervention cost. Within this framework, we define a family of state-dependent intervention policies and empirically characterize the resulting cost-volatility Pareto frontier. Experiments on a large-scale, temporally indexed credit-risk dataset (1.35M loans, 2007-2018) show that selective, drift-triggered interventions can achieve smoother reliability trajectories than continuous rolling retraining while substantially reducing operational cost. These findings position deployment reliability under temporal shift as a controllable multi-objective system and highlight the role of policy design in shaping stability-cost trade-offs in high-stakes tabular applications.