QMLGMLJan 25, 2024

Improving Antibody Humanness Prediction using Patent Data

arXiv:2401.14442v33 citationsICML
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in antibody therapeutic development for drug discovery, though it is incremental as it builds on existing prediction methods by incorporating new data sources.

The paper tackled the problem of predicting antibody humanness, a key factor in reducing immunogenic responses in drug discovery, by leveraging patent data with a multi-stage training process, achieving new state-of-the-art results on five out of six inference tasks.

We investigate the potential of patent data for improving the antibody humanness prediction using a multi-stage, multi-loss training process. Humanness serves as a proxy for the immunogenic response to antibody therapeutics, one of the major causes of attrition in drug discovery and a challenging obstacle for their use in clinical settings. We pose the initial learning stage as a weakly-supervised contrastive-learning problem, where each antibody sequence is associated with possibly multiple identifiers of function and the objective is to learn an encoder that groups them according to their patented properties. We then freeze a part of the contrastive encoder and continue training it on the patent data using the cross-entropy loss to predict the humanness score of a given antibody sequence. We illustrate the utility of the patent data and our approach by performing inference on three different immunogenicity datasets, unseen during training. Our empirical results demonstrate that the learned model consistently outperforms the alternative baselines and establishes new state-of-the-art on five out of six inference tasks, irrespective of the used metric.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes