Generative modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties

arXiv:2309.10170v143 citationsh-index: 112
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing synthetic spider silks with enhanced mechanical properties for materials science applications, representing an incremental advance by applying a fine-tuned generative model to a specific domain.

The authors tackled the problem of limited models for exploring sequence-property relationships in spider silk proteins by proposing a custom generative large-language model to design novel sequences with target mechanical properties, generating sequences with combinations not found in nature and analyzing sequence motifs to understand their roles in properties like elastic modulus and strength.

Spider silks are remarkable materials characterized by superb mechanical properties such as strength, extensibility and lightweightedness. Yet, to date, limited models are available to fully explore sequence-property relationships for analysis and design. Here we propose a custom generative large-language model to enable design of novel spider silk protein sequences to meet complex combinations of target mechanical properties. The model, pretrained on a large set of protein sequences, is fine-tuned on ~1,000 major ampullate spidroin (MaSp) sequences for which associated fiber-level mechanical properties exist, to yield an end-to-end forward and inverse generative strategy. Performance is assessed through: (1), a novelty analysis and protein type classification for generated spidroin sequences through BLAST searches, (2) property evaluation and comparison with similar sequences, (3) comparison of molecular structures, as well as, and (4) a detailed sequence motif analyses. We generate silk sequences with property combinations that do not exist in nature, and develop a deep understanding the mechanistic roles of sequence patterns in achieving overarching key mechanical properties (elastic modulus, strength, toughness, failure strain). The model provides an efficient approach to expand the silkome dataset, facilitating further sequence-structure analyses of silks, and establishes a foundation for synthetic silk design and optimization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes