QMAICEOct 3, 2025

InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions

arXiv:2510.03370v22 citationsh-index: 10
AI Analysis

This work addresses the challenge of resource-intensive training for researchers in computational biology, offering a practical and efficient alternative to end-to-end models.

The paper tackled the problem of computationally expensive training for multimodal protein language models by proposing a fine-tuning framework, InstructPLM-mu, which achieved performance comparable to ESM3 in protein mutation predictions with just 1-hour fine-tuning of ESM2 using structural inputs.

Multimodal protein language models deliver strong performance on mutation-effect prediction, but training such models from scratch demands substantial computational resources. In this paper, we propose a fine-tuning framework called InstructPLM-mu and try to answer a question: \textit{Can multimodal fine-tuning of a pretrained, sequence-only protein language model match the performance of models trained end-to-end? } Surprisingly, our experiments show that fine-tuning ESM2 with structural inputs can reach performance comparable to ESM3. To understand how this is achieved, we systematically compare three different feature-fusion designs and fine-tuning recipes. Our results reveal that both the fusion method and the tuning strategy strongly affect final accuracy, indicating that the fine-tuning process is not trivial. We hope this work offers practical guidance for injecting structure into pretrained protein language models and motivates further research on better fusion mechanisms and fine-tuning protocols.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes