LGOct 22, 2025

g-DPO: Scalable Preference Optimization for Protein Language Models

Constance Ferragu, Jonathan D. Ziegler, Nicolas Deutschmann, Arthur Lindoulsi, Eli Bixby, Cradle ML Team

arXiv:2510.19474v12 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses a computational efficiency problem for researchers and practitioners in protein engineering, offering an incremental improvement over existing DPO methods.

The paper tackles the scalability bottleneck of Direct Preference Optimization (DPO) for protein language models, where training time grows quadratically with dataset size, by introducing g-DPO, which uses clustering and approximations to achieve statistically indistinguishable performance while converging 1.8 to 3.7 times faster.

Direct Preference Optimization (DPO) is an effective approach for aligning protein language models with experimental design goals. However, DPO faces a scalability bottleneck: the number of possible training pairs grows quadratically with the number of labeled sequences, leading to prohibitive training times even for modestly sized datasets. We introduce g-DPO, a framework that (i) uses sequence space clustering to prune redundant pairs while preserving training signal, and (ii) amortizes likelihood computations with group-based approximations. Across three protein engineering tasks, g-DPO maintains in-silico and in-vitro performance that is statistically indistinguishable from standard DPO, while converging 1.8 to 3.7 times faster, with greater gains expected as the size of the dataset increases.

View on arXiv PDF

Similar