Dabao Zhang

2papers

2 Papers

MLJul 3, 2024
Fast Calculation of Feature Contributions in Boosting Trees

Zhongli Jiang, Min Zhang, Dabao Zhang

Recently, several fast algorithms have been proposed to decompose predicted value into Shapley values, enabling individualized feature contribution analysis in tree models. While such local decomposition offers valuable insights, it underscores the need for a global evaluation of feature contributions. Although coefficients of determination ($R^2$) allow for comparative assessment of individual features, individualizing $R^2$ is challenged by the underlying quadratic losses. To address this, we propose Q-SHAP, an efficient algorithm that reduces the computational complexity of calculating Shapley values for quadratic losses to polynomial time. Our simulations show that Q-SHAP not only improves computational efficiency but also enhances the accuracy of feature-specific $R^2$ estimates.

MEJul 26, 2018
Differential Analysis of Directed Networks

Min Ren, Dabao Zhang

We developed a novel statistical method to identify structural differences between networks characterized by structural equation models. We propose to reparameterize the model to separate the differential structures from common structures, and then design an algorithm with calibration and construction stages to identify these differential structures. The calibration stage serves to obtain consistent prediction by building the L2 regularized regression of each endogenous variables against pre-screened exogenous variables, correcting for potential endogeneity issue. The construction stage consistently selects and estimates both common and differential effects by undertaking L1 regularized regression of each endogenous variable against the predicts of other endogenous variables as well as its anchoring exogenous variables. Our method allows easy parallel computation at each stage. Theoretical results are obtained to establish nonasymptotic error bounds of predictions and estimates at both stages, as well as the consistency of identified common and differential effects. Our studies on synthetic data demonstrated that our proposed method performed much better than independently constructing the networks. A real data set is analyzed to illustrate the applicability of our method.