LGSep 23, 2025

Improved Therapeutic Antibody Reformatting through Multimodal Machine Learning

Jiayi Xin, Aniruddh Raghu, Nick Bhattacharya, Adam Carr, Melanie Montgomery, Hunter Elliott

arXiv:2509.19604v1h-index: 13

Originality Incremental advance

AI Analysis

This work addresses a specific engineering problem in therapeutic antibody design for biomedical researchers, offering an incremental improvement by optimizing existing methods for better generalization.

The paper tackled the challenge of predicting whether converting therapeutic antibodies between formats would succeed, by developing a multimodal machine learning framework that incorporates sequence and structural data. It found that domain-tailored multimodal models outperformed large pretrained protein language models, achieving high predictive accuracy in a 'new antibody, no data' scenario, which helps prioritize candidates and reduce experimental waste.

Modern therapeutic antibody design often involves composing multi-part assemblages of individual functional domains, each of which may be derived from a different source or engineered independently. While these complex formats can expand disease applicability and improve safety, they present a significant engineering challenge: the function and stability of individual domains are not guaranteed in the novel format, and the entire molecule may no longer be synthesizable. To address these challenges, we develop a machine learning framework to predict "reformatting success" -- whether converting an antibody from one format to another will succeed or not. Our framework incorporates both antibody sequence and structural context, incorporating an evaluation protocol that reflects realistic deployment scenarios. In experiments on a real-world antibody reformatting dataset, we find the surprising result that large pretrained protein language models (PLMs) fail to outperform simple, domain-tailored, multimodal representations. This is particularly evident in the most difficult evaluation setting, where we test model generalization to a new starting antibody. In this challenging "new antibody, no data" scenario, our best multimodal model achieves high predictive accuracy, enabling prioritization of promising candidates and reducing wasted experimental effort.

View on arXiv PDF

Similar