BMAIJun 11, 2024

Are Protein Language Models Compute Optimal?

arXiv:2406.07249v210 citations
AI Analysis

This work addresses compute inefficiency in pLMs for computational biology, potentially democratizing their training and application, though it is incremental as it adapts existing NLP methodologies.

The study investigated scaling laws for protein language models (pLMs) to determine compute-optimal configurations, finding that model sizes scale sublinearly with compute and that a 35M model achieved perplexity comparable to larger models like ESM-2 (15B) and xTrimoPGLM (100B) with reduced training tokens.

While protein language models (pLMs) have transformed biological research, the scaling laws governing their improvement remain underexplored. By adapting methodologies from NLP scaling laws, we investigated the optimal ratio between model parameters and training tokens within a fixed compute budget. Our study reveals that pLM sizes scale sublinearly with compute budget, showing diminishing returns in performance as model size increases, and we identify a performance plateau in training loss comparable to the one found in relevant works in the field. Our findings suggest that widely-used pLMs might not be compute-optimal, indicating that larger models could achieve convergence more efficiently. Training a 35M model on a reduced token set, we attained perplexity results comparable to larger models like ESM-2 (15B) and xTrimoPGLM (100B) with a single dataset pass. This work paves the way towards more compute-efficient pLMs, democratizing their training and practical application in computational biology.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes