BMAILGJun 10, 2024

Contrastive learning of T cell receptor representations

arXiv:2406.06397v221 citations
AI Analysis

This addresses the data bottleneck in immunology for predicting TCR specificity, offering a domain-specific advancement.

The paper tackled the challenge of predicting T cell receptor (TCR) interactions with ligands by introducing SCEPTR, a language model that uses a novel pre-training strategy combining autocontrastive learning and masked-language modeling, achieving state-of-the-art performance and outperforming existing methods and variants without autocontrastive learning.

Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here we introduce a TCR language model called SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors), capable of data-efficient transfer learning. Through our model, we introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes