Sangwook Kang

h-index1
2papers

2 Papers

MLJun 13, 2022
Deep Neural Network Based Accelerated Failure Time Models using Rank Loss

Gwangsu Kim, Sangwook Kang

An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, whose interpretation is intuitive. The semiparametric AFT model that does not specify the error distribution is flexible and robust to departures from the distributional assumption. Owing to the desirable features, this class of models has been considered as a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the nonlinearity of predictors when modeling the mean. Deep neural networks (DNNs) have received a focal attention over the past decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing the nonlinearity. By taking advantage of this, we propose to apply DNNs in fitting AFT models using a Gehan-type loss, combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank based AFT model (DeepR-AFT) are investigated via an extensive stimulation study. DeepR-AFT shows a superior performance over its parametric or semiparametric counterparts when the predictor is nonlinear. For linear predictors, DeepR-AFT performs better when the dimensions of covariates are large. The proposed DeepR-AFT is illustrated using two real datasets, which demonstrates its superiority.

MEJul 23, 2025
Penalized Empirical Likelihood for Doubly Robust Causal Inference under Contamination in High Dimensions

Byeonghee Lee, Sangwook Kang, Ju-Hyun Park et al.

We propose a doubly robust estimator for the average treatment effect in high dimensional low sample size observational studies, where contamination and model misspecification pose serious inferential challenges. The estimator combines bounded influence estimating equations for outcome modeling with covariate balancing propensity scores for treatment assignment, embedded within a penalized empirical likelihood framework using nonconvex regularization. It satisfies the oracle property by jointly achieving consistency under partial model correct ness, selection consistency, robustness to contamination, and asymptotic normality. For uncertainty quantification, we derive a finite sample confidence interval using cumulant generating functions and influence function corrections, avoiding reliance on asymptotic approximations. Simulation studies and applications to gene expression datasets (Golub and Khan) demonstrate superior performance in bias, error metrics, and interval calibration, highlighting the method robustness and inferential validity in HDLSS regimes. One notable aspect is that even in the absence of contamination, the proposed estimator and its confidence interval remain efficient compared to those of competing models.