Causal-discovery-based root-cause analysis and its application in time-series prediction error diagnosis
This addresses the challenge of trust and reliability in industrial applications by providing a more accurate root-cause analysis for prediction errors, though it is incremental as it builds on existing Shapley value methods.
The paper tackles the problem of diagnosing prediction errors in black-box machine learning models by introducing CD-RCA, a method that estimates causal relationships without predefined graphs and uses Shapley values to identify variable contributions to outliers, with experiments showing it outperforms current heuristic attribution methods.
Recent rapid advancements of machine learning have greatly enhanced the accuracy of prediction models, but most models remain "black boxes", making prediction error diagnosis challenging, especially with outliers. This lack of transparency hinders trust and reliability in industrial applications. Heuristic attribution methods, while helpful, often fail to capture true causal relationships, leading to inaccurate error attributions. Various root-cause analysis methods have been developed using Shapley values, yet they typically require predefined causal graphs, limiting their applicability for prediction errors in machine learning models. To address these limitations, we introduce the Causal-Discovery-based Root-Cause Analysis (CD-RCA) method that estimates causal relationships between the prediction error and the explanatory variables, without needing a pre-defined causal graph. By simulating synthetic error data, CD-RCA can identify variable contributions to outliers in prediction errors by Shapley values. Extensive experiments show CD-RCA outperforms current heuristic attribution methods.