MLSep 29, 2025
Conservative Decisions with Risk ScoresYishu Wei, Wen-Yee Lee, George Ekow Quaye et al.
In binary classification applications, conservative decision-making that allows for abstention can be advantageous. To this end, we introduce a novel approach that determines the optimal cutoff interval for risk scores, which can be directly available or derived from fitted models. Within this interval, the algorithm refrains from making decisions, while outside the interval, classification accuracy is maximized. Our approach is inspired by support vector machines (SVM), but differs in that it minimizes the classification margin rather than maximizing it. We provide the theoretical optimal solution to this problem, which holds important practical implications. Our proposed method not only supports conservative decision-making but also inherently results in a risk-coverage curve. Together with the area under the curve (AUC), this curve can serve as a comprehensive performance metric for evaluating and comparing classifiers, akin to the receiver operating characteristic (ROC) curve. To investigate and illustrate our approach, we conduct both simulation studies and a real-world case study in the context of diagnosing prostate cancer.
MLSep 22, 2025
End-Cut Preference in Survival TreesXiaogang Su
The end-cut preference (ECP) problem, referring to the tendency to favor split points near the boundaries of a feature's range, is a well-known issue in CART (Breiman et al., 1984). ECP may induce highly imbalanced and biased splits, obscure weak signals, and lead to tree structures that are both unstable and difficult to interpret. For survival trees, we show that ECP also arises when using greedy search to select the optimal cutoff point by maximizing the log-rank test statistic. To address this issue, we propose a smooth sigmoid surrogate (SSS) approach, in which the hard-threshold indicator function is replaced by a smooth sigmoid function. We further demonstrate, both theoretically and through numerical illustrations, that SSS provides an effective remedy for mitigating or avoiding ECP.
MLSep 14, 2017
Random Forests of Interaction Trees for Estimating Individualized Treatment Effects in Randomized TrialsXiaogang Su, Annette T. Peña, Lei Liu et al.
Assessing heterogeneous treatment effects has become a growing interest in advancing precision medicine. Individualized treatment effects (ITE) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees (Su et al., 2009). To this end, we first propose a smooth sigmoid surrogate (SSS) method, as an alternative to greedy search, to speed up tree construction. RFIT outperforms the traditional `separate regression' approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT can be obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial.
MLSep 13, 2017
Weighted Orthogonal Components Regression AnalysisXiaogang Su, Yaa Wonkye, Pei Wang et al.
In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods.