Shi-ang Qi

LG
h-index9
8papers
276citations
Novelty52%
AI Score52

8 Papers

LGJun 1, 2023
An Effective Meaningful Way to Evaluate Survival Models

Shi-ang Qi, Neeraj Kumar, Mahtab Farrokh et al.

One straightforward metric to evaluate a survival prediction model is based on the Mean Absolute Error (MAE) -- the average of the absolute difference between the time predicted by the model and the true event time, over all subjects. Unfortunately, this is challenging because, in practice, the test set includes (right) censored individuals, meaning we do not know when a censored individual actually experienced the event. In this paper, we explore various metrics to estimate MAE for survival datasets that include (many) censored individuals. Moreover, we introduce a novel and effective approach for generating realistic semi-synthetic survival datasets to facilitate the evaluation of metrics. Our findings, based on the analysis of the semi-synthetic datasets, reveal that our proposed metric (MAE using pseudo-observations) is able to rank models accurately based on their performance, and often closely matches the true MAE -- in particular, is better than several alternative methods.

AIJun 27, 2023
ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

Yakun Yu, Mingjun Zhao, Shi-ang Qi et al. · tencent-ai

Multimodal Sentiment Analysis leverages multimodal signals to detect the sentiment of a speaker. Previous approaches concentrate on performing multimodal fusion and representation learning based on general knowledge obtained from pretrained models, which neglects the effect of domain-specific knowledge. In this paper, we propose Contrastive Knowledge Injection (ConKI) for multimodal sentiment analysis, where specific-knowledge representations for each modality can be learned together with general knowledge representations via knowledge injection based on an adapter architecture. In addition, ConKI uses a hierarchical contrastive learning procedure performed between knowledge types within every single modality, across modalities within each sample, and across samples to facilitate the effective learning of the proposed representations, hence improving multimodal sentiment predictions. The experiments on three popular multimodal sentiment analysis benchmarks show that ConKI outperforms all prior methods on a variety of performance metrics.

LGSep 10, 2024Code
MENSA: A Multi-Event Network for Survival Analysis with Trajectory-based Likelihood Estimation

Christian Marius Lillelund, Ali Hossein Gharari Foomani, Weijie Sun et al.

Most existing time-to-event methods focus on either single-event or competing-risks settings, leaving multi-event scenarios relatively underexplored. In many healthcare applications, for example, a patient may experience multiple clinical events, that can be non-exclusive and semi-competing. A common workaround is to train independent single-event models for such multi-event problems, but this approach fails to exploit dependencies and shared structures across events. To overcome these limitations, we propose MENSA (Multi-Event Network for Survival Analysis), a deep learning model that jointly learns flexible time-to-event distributions for multiple events, whether competing or co-occurring. In addition, we introduce a novel trajectory-based likelihood term that captures the temporal ordering between events. Across four multi-event datasets, MENSA improves predictive performance over many state-of-the-art baselines. Source code is available at https://github.com/thecml/mensa.

96.2LGMay 15Code
SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference

Shi-ang Qi, Vahid Balazadeh, Michael Cooper et al.

Survival analysis provides a powerful statistical framework for modeling time-to-event outcomes in the presence of censoring. However, selecting an appropriate estimator from the many specialized survival approaches often requires substantial methodological and domain expertise. We introduce SurvivalPFN, a prior-data fitted network that amortizes Bayesian inference for censored observations through in-context learning. SurvivalPFN is pretrained on a diverse family of synthetic, identifiable, and right-censored data-generating processes, enabling it to amortize survival analysis in a single forward pass during inference. As a result, the model adapts to the effective complexity of each dataset without task-specific training or hyperparameter tuning, avoids restrictive parametric assumptions, and produces calibrated survival distributions. In a large-scale benchmark spanning 61 datasets, 21 methods, and 5 evaluation metrics, SurvivalPFN achieves strong predictive performance and often improves upon established survival models. These results suggest that SurvivalPFN offers a principled and practical foundation model for survival analysis, with potential applications in high-impact domains such as healthcare, finance, and engineering (https://github.com/rgklab/SurvivalPFN).

LGMay 12, 2024
Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration

Shi-ang Qi, Yakun Yu, Russell Greiner

Discrimination and calibration represent two important properties of survival analysis, with the former assessing the model's ability to accurately rank subjects and the latter evaluating the alignment of predicted outcomes with actual events. With their distinct nature, it is hard for survival models to simultaneously optimize both of them especially as many previous results found improving calibration tends to diminish discrimination performance. This paper introduces a novel approach utilizing conformal regression that can improve a model's calibration without degrading discrimination. We provide theoretical guarantees for the above claim, and rigorously validate the efficiency of our approach across 11 real-world datasets, showcasing its practical applicability and robustness in diverse scenarios.

MLFeb 26, 2025
Overcoming Dependent Censoring in the Evaluation of Survival Models

Christian Marius Lillelund, Shi-ang Qi, Russell Greiner

Conventional survival metrics, such as Harrell's concordance index (CI) and the Brier Score, rely on the independent censoring assumption for valid inference with right-censored data. However, in the presence of so-called dependent censoring, where the probability of censoring is related to the event of interest, these metrics can give biased estimates of the underlying model error. In this paper, we introduce three new evaluation metrics for survival analysis based on Archimedean copulas that can account for dependent censoring. We also develop a framework to generate realistic, semi-synthetic datasets with dependent censoring to facilitate the evaluation of the metrics. Our experiments in synthetic and semi-synthetic data demonstrate that the proposed metrics can provide more accurate estimates of the model error than conventional metrics under dependent censoring.

LGOct 27, 2024
Toward Conditional Distribution Calibration in Survival Prediction

Shi-ang Qi, Yakun Yu, Russell Greiner

Survival prediction often involves estimating the time-to-event distribution from censored datasets. Previous approaches have focused on enhancing discrimination and marginal calibration. In this paper, we highlight the significance of conditional calibration for real-world applications -- especially its role in individual decision-making. We propose a method based on conformal prediction that uses the model's predicted individual survival probability at that instance's observed time. This method effectively improves the model's marginal and conditional calibration, without compromising discrimination. We provide asymptotic theoretical guarantees for both marginal and conditional calibration and test it extensively across 15 diverse real-world datasets, demonstrating the method's practical effectiveness and versatility in various settings.

MEJun 2, 2025
Stop Chasing the C-index: This Is How We Should Evaluate Our Survival Models

Christian Marius Lillelund, Shi-ang Qi, Russell Greiner et al.

We argue that many survival analysis and time-to-event models are incorrectly evaluated. First, we survey many examples of evaluation approaches in the literature and find that most rely on concordance (C-index). However, the C-index only measures a model's discriminative ability and does not assess other important aspects, such as the accuracy of the time-to-event predictions or the calibration of the model's probabilistic estimates. Next, we present a set of key desiderata for choosing the right evaluation metric and discuss their pros and cons. These are tailored to the challenges in survival analysis, such as sensitivity to miscalibration and various censoring assumptions. We hypothesize that the current development of survival metrics conforms to a double-helix ladder, and that model validity and metric validity must stand on the same rung of the assumption ladder. Finally, we discuss the appropriate methods for evaluating a survival model in practice and summarize various viewpoints opposing our analysis.