Relaxed Sequence Sampling for Diverse Protein Design
This work addresses the need for more diverse and biologically plausible protein designs, particularly for tasks like binder design, representing a novel method for a known bottleneck rather than a foundational advance.
The paper tackled the problem of limited diversity and designability in protein design by introducing Relaxed Sequence Sampling (RSS), a Markov chain Monte Carlo framework that integrates structural and evolutionary information, resulting in 5× more designable structures and 2-3× greater structural diversity compared to baselines.
Protein design using structure prediction models such as AlphaFold2 has shown remarkable success, but existing approaches like relaxed sequence optimization (RSO) rely on single-path gradient descent and ignore sequence-space constraints, limiting diversity and designability. We introduce Relaxed Sequence Sampling (RSS), a Markov chain Monte Carlo (MCMC) framework that integrates structural and evolutionary information for protein design. RSS operates in continuous logit space, combining gradient-guided exploration with protein language model-informed jumps. Its energy function couples AlphaFold2-derived structural objectives with ESM2-derived sequence priors, balancing accuracy and biological plausibility. In an in silico protein binder design task, RSS produces 5$\times$ more designable structures and 2-3$\times$ greater structural diversity than RSO baselines, at equal computational cost. These results highlight RSS as a principled approach for efficiently exploring the protein design landscape.