Amin Tavakoli

h-index19
2papers

2 Papers

LGDec 10, 2025
Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design

Amin Tavakoli, Raswanth Murugan, Ozan Gokdemir et al.

Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains, yet its application to protein sequence modeling and protein language models (PLMs) remains ad hoc. This is in part because high-quality annotated data are far more difficult to obtain for proteins than for natural language. We present a simple and general recipe for fast SFT of PLMs, designed to improve the fidelity, reliability, and novelty of generated protein sequences. Unlike existing approaches that require costly precompiled experimental datasets for SFT, our method leverages the PLM itself, integrating a lightweight curation pipeline with domain-specific filters to construct high-quality training data. These filters can independently refine a PLM's output and identify candidates for in vitro evaluation; when combined with SFT, they enable PLMs to generate more stable and functional enzymes, while expanding exploration into protein sequence space beyond natural variants. Although our approach is agnostic to both the choice of protein language model (PLM) and the protein system, we demonstrate its effectiveness with a genome-scale PLM (GenSLM) applied to the tryptophan synthase enzyme family. The supervised fine-tuned model generates sequences that are not only more novel but also display improved characteristics across both targeted design constraints and emergent protein property measures.

LGJul 5, 2025
OrbitAll: A Unified Quantum Mechanical Representation Deep Learning Framework for All Molecular Systems

Beom Seok Kang, Vignesh C. Bhethanabotla, Amin Tavakoli et al.

Despite the success of deep learning methods in quantum chemistry, their representational capacity is most often confined to neutral, closed-shell molecules. However, real-world chemical systems often exhibit complex characteristics, including varying charges, spins, and environments. We introduce OrbitAll, a geometry- and physics-informed deep learning framework that can represent all molecular systems with electronic structure information. OrbitAll utilizes spin-polarized orbital features from the underlying quantum mechanical method, and combines it with graph neural networks satisfying SE(3)-equivariance. The resulting framework can represent and process any molecular system with arbitrary charges, spins, and environmental effects. OrbitAll demonstrates superior performance and generalization on predicting charged, open-shell, and solvated molecules, while also robustly extrapolating to molecules significantly larger than the training data by leveraging a physics-informed architecture. OrbitAll achieves chemical accuracy using 10 times fewer training data than competing AI models, with a speedup of approximately $10^3$ - $10^4$ compared to density functional theory.