Entropy Production in Machine Learning Under Fokker-Planck Probability Flow
This addresses the problem of balancing retraining decisions against operational costs for machine learning practitioners in nonstationary environments, but it is incremental as it builds on existing drift detection methods with a new theoretical interpretation.
The paper tackled performance degradation in machine learning models due to data drift by proposing an entropy-based retraining framework grounded in nonequilibrium statistical physics, achieving predictive performance comparable to frequent retraining while reducing retraining frequency by one to two orders of magnitude in synthetic, financial, and web-traffic domains, though it underperformed in a biomedical ECG setting.
Machine learning models deployed in nonstationary environments inevitably experience performance degradation due to data drift. While numerous drift detection heuristics exist, most lack a dynamical interpretation and provide limited guidance on how retraining decisions should be balanced against operational cost. In this work, we propose an entropy-based retraining framework grounded in nonequilibrium statistical physics. Interpreting drift as probability flow governed by a Fokker-Planck equation, we quantify model-data mismatch using relative entropy and show that its time derivative admits an entropy-balance decomposition featuring a nonnegative entropy production term driven by probability currents. Guided by this theory, we implement an entropy-triggered retraining policy using an exponentially weighted moving-average (EWMA) control statistic applied to a streaming kernel density estimator of the Kullback-Leibler divergence. We evaluate this approach across multiple nonstationary data streams. In synthetic, financial, and web-traffic domains, entropy-based retraining achieves predictive performance comparable to frequent retraining while reducing retraining frequency by one to two orders of magnitude. However, in a challenging biomedical ECG setting, the entropy-based trigger underperforms the maximum-frequency baseline, highlighting limitations of feature-space entropy monitoring under complex label-conditional drift.