David L. Kaplan

MTRL-SCI
4papers
58citations
Novelty48%
AI Score42

4 Papers

MTRL-SCISep 18, 2023
Generative modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties

Wei Lu, David L. Kaplan, Markus J. Buehler

Spider silks are remarkable materials characterized by superb mechanical properties such as strength, extensibility and lightweightedness. Yet, to date, limited models are available to fully explore sequence-property relationships for analysis and design. Here we propose a custom generative large-language model to enable design of novel spider silk protein sequences to meet complex combinations of target mechanical properties. The model, pretrained on a large set of protein sequences, is fine-tuned on ~1,000 major ampullate spidroin (MaSp) sequences for which associated fiber-level mechanical properties exist, to yield an end-to-end forward and inverse generative strategy. Performance is assessed through: (1), a novelty analysis and protein type classification for generated spidroin sequences through BLAST searches, (2) property evaluation and comparison with similar sequences, (3) comparison of molecular structures, as well as, and (4) a detailed sequence motif analyses. We generate silk sequences with property combinations that do not exist in nature, and develop a deep understanding the mechanistic roles of sequence patterns in achieving overarching key mechanical properties (elastic modulus, strength, toughness, failure strain). The model provides an efficient approach to expand the silkome dataset, facilitating further sequence-structure analyses of silks, and establishes a foundation for synthetic silk design and optimization.

MTRL-SCIOct 16, 2023
ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model

Bo Ni, David L. Kaplan, Markus J. Buehler

Through evolution, nature has presented a set of remarkable protein materials, including elastins, silks, keratins and collagens with superior mechanical performances that play crucial roles in mechanobiology. However, going beyond natural designs to discover proteins that meet specified mechanical properties remains challenging. Here we report a generative model that predicts protein designs to meet complex nonlinear mechanical property-design objectives. Our model leverages deep knowledge on protein sequences from a pre-trained protein language model and maps mechanical unfolding responses to create novel proteins. Via full-atom molecular simulations for direct validation, we demonstrate that the designed proteins are novel, and fulfill the targeted mechanical properties, including unfolding energy and mechanical strength, as well as the detailed unfolding force-separation curves. Our model offers rapid pathways to explore the enormous mechanobiological protein sequence space unconstrained by biological synthesis, using mechanical features as target to enable the discovery of protein materials with superior mechanical properties.

74.0CEApr 25
Artificial Intelligence for Food Innovation

Bianca Datta, Markus J. Buehler, Yvonne Chow et al.

Global food systems must deliver nutritious, sustainable foods while sharply reducing environmental impact. Yet, food innovation remains slow, empirical, and fragmented. Artificial intelligence (AI) offers a transformative path to link molecular composition to functional performance, connect chemical structure to sensory outcomes, and accelerate cross-disciplinary innovation across the production pipeline. While broadly applicable to food systems, we focus on sustainable proteins--plant-based, fermentation-derived, and cultivated--as a high-impact testbed for AI-driven closed-loop design. We review the applications, opportunities, and challenges of AI for Food as an emerging discipline that integrates ingredient design, formulation development, fermentation and production, texture analysis, sensory science, manufacturing, and recipe generation. We identify four priorities: advancing scientific machine learning with embedded domain priors, treating food as a programmable biomaterial, building self-driving laboratories for automated discovery, and developing deep reasoning models that integrate nutrition and sustainability. Integrating AI responsibly into the food innovation cycle can accelerate the transition to sustainable food systems and establish a predictive, design-driven science of food for human and planetary health.

AINov 27, 2025
Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation

Fiona Y. Wang, Di Sheng Lee, David L. Kaplan et al.

Designing proteins de novo with tailored structural, physicochemical, and functional properties remains a grand challenge in biotechnology, medicine, and materials science, due to the vastness of sequence space and the complex coupling between sequence, structure, and function. Current state-of-the-art generative methods, such as protein language models (PLMs) and diffusion-based architectures, often require extensive fine-tuning, task-specific data, or model reconfiguration to support objective-directed design, thereby limiting their flexibility and scalability. To overcome these limitations, we present a decentralized, agent-based framework inspired by swarm intelligence for de novo protein design. In this approach, multiple large language model (LLM) agents operate in parallel, each assigned to a specific residue position. These agents iteratively propose context-aware mutations by integrating design objectives, local neighborhood interactions, and memory and feedback from previous iterations. This position-wise, decentralized coordination enables emergent design of diverse, well-defined sequences without reliance on motif scaffolds or multiple sequence alignments, validated with experiments on proteins with alpha helix and coil structures. Through analyses of residue conservation, structure-based metrics, and sequence convergence and embeddings, we demonstrate that the framework exhibits emergent behaviors and effective navigation of the protein fitness landscape. Our method achieves efficient, objective-directed designs within a few GPU-hours and operates entirely without fine-tuning or specialized training, offering a generalizable and adaptable solution for protein design. Beyond proteins, the approach lays the groundwork for collective LLM-driven design across biomolecular systems and other scientific discovery tasks.