Chenpeng Yu

h-index2
2papers

2 Papers

BMApr 8, 2024
A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules

Chenpeng Yu, Xing Fang, Hui Liu

The immune checkpoint inhibitors have demonstrated promising clinical efficacy across various tumor types, yet the percentage of patients who benefit from them remains low. The bindings between tumor antigens and HLA-I/TCR molecules determine the antigen presentation and T-cell activation, thereby playing an important role in the immunotherapy response. In this paper, we propose UnifyImmun, a unified cross-attention transformer model designed to simultaneously predict the bindings of peptides to both receptors, providing more comprehensive evaluation of antigen immunogenicity. We devise a two-phase strategy using virtual adversarial training that enables these two tasks to reinforce each other mutually, by compelling the encoders to extract more expressive features. Our method demonstrates superior performance in predicting both pHLA and pTCR binding on multiple independent and external test sets. Notably, on a large-scale COVID-19 pTCR binding test set without any seen peptide in training set, our method outperforms the current state-of-the-art methods by more than 10\%. The predicted binding scores significantly correlate with the immunotherapy response and clinical outcomes on two clinical cohorts. Furthermore, the cross-attention scores and integrated gradients reveal the amino-acid sites critical for peptide binding to receptors. In essence, our approach marks a significant step toward comprehensive evaluation of antigen immunogenicity.

QMJun 24, 2024
tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity

Xing Fang, Chenpeng Yu, Shiye Tian et al.

The anti-cancer immune response relies on the bindings between T-cell receptors (TCRs) and antigens, which elicits adaptive immunity to eliminate tumor cells. This ability of the immune system to respond to novel various neoantigens arises from the immense diversity of TCR repository. However, TCR diversity poses a significant challenge on accurately predicting antigen-TCR bindings. In this study, we introduce a lightweight masked language model, termed tcrLM, to address this challenge. Our approach involves randomly masking segments of TCR sequences and training tcrLM to infer the masked segments, thereby enabling the extraction of expressive features from TCR sequences. To further enhance robustness, we incorporate virtual adversarial training into tcrLM. We construct the largest TCR CDR3 sequence set with more than 100 million distinct sequences, and pretrain tcrLM on these sequences. The pre-trained encoder is subsequently applied to predict TCR-antigen binding specificity. We evaluate model performance on three test datasets: independent, external, and COVID-19 test set. The results demonstrate that tcrLM not only surpasses existing TCR-antigen binding prediction methods, but also outperforms other mainstream protein language models. More interestingly, tcrLM effectively captures the biochemical properties and positional preference of amino acids within TCR sequences. Additionally, the predicted TCR-neoantigen binding scores indicates the immunotherapy responses and clinical outcomes in a melanoma cohort. These findings demonstrate the potential of tcrLM in predicting TCR-antigen binding specificity, with significant implications for advancing immunotherapy and personalized medicine.