Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
This work addresses the challenge of enzyme design for biocatalysis, offering a novel method with broad applicability across enzyme families, though it is incremental in advancing generative protein design.
The paper tackles the problem of automatically designing functional enzymes by proposing EnzyGen, a unified generative model that designs enzyme sequences and 3D structures based on functionally important sites and substrates, achieving a 10.79% improvement in substrate binding affinity over the best baseline across 323 testing families.
Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.