LG AI QM MLFeb 9, 2023

A Text-guided Protein Design Framework

Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, Jian Tang, Hongyu Guo

arXiv:2302.04611v428.3101 citationsh-index: 43Has Code

Originality Highly original

AI Analysis

This work addresses protein design for biomedical applications by integrating textual knowledge, representing a novel method for a known bottleneck.

The authors tackled the problem of AI-assisted protein design by incorporating human-curated text descriptions, proposing ProteinDT, a multi-modal framework that achieved over 90% accuracy in text-guided protein generation and superior performance in editing and prediction tasks.

Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.

View on arXiv PDF Code

Similar