Steven Rayan

h-index9

6papers

464citations

Novelty34%

AI Score32

Ranked #126,712 of 194,257 authors (top 65%)#4 in MN (top 57%)

6 Papers

1.2MNJun 10, 2025

GPU-accelerated Modeling of Biological Regulatory Networks

Joyce Reimer, Pranta Saha, Chris Chen et al.

The complex regulatory dynamics of a biological network can be succinctly captured using discrete logic models. Given even sparse time-course data from the system of interest, previous work has shown that global optimization schemes are suitable for proposing logic models that explain the data and make predictions about how the system will behave under varying conditions. Considering the large scale of the parameter search spaces associated with these regulatory systems, performance optimizations on the level of both hardware and software are necessary for making this a practical tool for in silico pharmaceutical research. We show here how the implementation of these global optimization algorithms in a GPU-computing environment can accelerate the solution of these parameter search problems considerably. We carry out parameter searches on two model biological regulatory systems that represent almost an order of magnitude scale-up in complexity, and we find the gains in efficiency from GPU to be a 33%-43% improvement compared to multi-thread CPU implementations and a 33%-1866% increase compared to CPU in serial. These improvements make global optimization of logic model identification a far more attractive and feasible method for in silico hypothesis generation and design of experiments.

1.2QUANT-PHJul 17, 2025

Identifying Protein Co-regulatory Network Logic by Solving B-SAT Problems through Gate-based Quantum Computing

Aspen Erlandsson Brisebois, Jason Broderick, Zahed Khatooni et al.

There is growing awareness that the success of pharmacologic interventions on living organisms is significantly impacted by context and timing of exposure. In turn, this complexity has led to an increased focus on regulatory network dynamics in biology and our ability to represent them in a high-fidelity way, in silico. Logic network models show great promise here and their parameter estimation can be formulated as a constraint satisfaction problem (CSP) that is well-suited to the often sparse, incomplete data in biology. Unfortunately, even in the case of Boolean logic, the combinatorial complexity of these problems grows rapidly, challenging the creation of models at physiologically-relevant scales. That said, quantum computing, while still nascent, facilitates novel information-processing paradigms with the potential for transformative impact in problems such as this one. In this work, we take a first step at actualizing this potential by identifying the structure and Boolean decisional logic of a well-studied network linking 5 proteins involved in the neural development of the mammalian cortical area of the brain. We identify the protein-protein connectivity and binary decisional logic governing this network by formulating it as a Boolean Satisfiability (B-SAT) problem. We employ Grover's algorithm to solve the NP-hard problem faster than the exponential time complexity required by deterministic classical algorithms. Using approaches deployed on both quantum simulators and actual noisy intermediate scale quantum (NISQ) hardware, we accurately recover several high-likelihood models from very sparse protein expression data. The results highlight the differential roles of data types in supporting accurate models; the impact of quantum algorithm design as it pertains to the mutability of quantum hardware; and the opportunities for accelerated discovery enabled by this approach.

8.4QUANT-PHJun 27

Exploring the Effects of Entanglement on Quantum Machine Learning of Pathogen Epitope-Receptor Binding

Aspen Erlandsson Brisebois, Luis Pablo Gonzalez Dominguez, Shivansi Prajapati et al.

Parameterized quantum circuits (PQCs) provide a flexible substrate for hybrid quantum machine learning (QML), but their practical value on Noisy Intermediate-Scale Quantum (NISQ) devices remains an empirical question, especially because training depth and scale can introduce optimization challenges such as barren plateaus. Here we study how the number and topology of two-qubit entangling gates in the feature-map stage influence a fixed hybrid QNN workflow for classifying strong versus weak epitope-receptor binding in Porcine Reproductive and Respiratory Syndrome (PRRS) vaccine design. The dataset consists of docking-derived binding affinities for N=80 9-mer epitopes, labeled as Strong or Weak binding, and partitioned into training, validation, and test subsets using a 40:30:30 split. We compare a classical CNN benchmark with a hybrid Embedding-QNN architecture under four feature-map configurations: a non-entangling Z feature map, an all-to-all high-entanglement ZZ feature map, and two interleaved nearest-neighbour entanglement patterns of low and high depth. Among the configurations tested, the high-entanglement ZZ feature map is seen to provide the strongest evidence of reduced training-set overfit, with a lower training area under the accuracy curve (AUAC) and the highest test/training AUAC ratio, while preserving competitive test-set accuracy. These results do not establish a general QML advantage, but they suggest that feature-map entanglement topology is a meaningful design variable for sparse biological screening tasks and warrants further evaluation with additional metrics, larger datasets, and noise-aware or hardware-based experiments.

3.3BMJun 27

Transformer-Based Active Learning for Data-Efficient Vaccine Epitope Selection in PRRS

Aspen Erlandsson Brisebois, Zahed Khatooni, Connor Burbridge et al.

High-fidelity molecular docking simulations can produce biologically relevant estimates of epitope-receptor binding affinity but are computationally expensive and therefore limit the number of candidates that can be screened for vaccine design. In this work, we evaluate machine learning (ML) approaches where variants of active learning are used to classify instances of high binding affinity between 9-mer epitopes and a well-conserved swine leukocyte antigen (SLA) receptor in the context of Porcine Reproductive and Respiratory Syndrome (PRRS). We use an internally generated dataset of 80 epitope-SLA docking affinities, each requiring more than 48 hours of high-performance computing (HPC). Multiple model families (linear, MLP, CNN, and a small transformer) are trained under strict low-data conditions within a pool-based active learning loop. In each case, optimal model configurations are identified by conducting large-scale hyperparameter optimization over the combined space of model architecture, training configuration, acquisition policy, and ensemble decision rules. To mitigate the effects of data subsample selection, each candidate configuration is evaluated by averaging performance over many randomized and balanced training and validation data subsets. Across experiments, transformer-based sequence models consistently emerged as the best-performing architecture, with active incremental learning yielding significant improvement over a baseline random sample acquisition strategy. Under moderate training data availability (N=30), the optimized ML-model configuration outperforms a standard baseline trained on twice the amount of data. Under higher training data availability (N=60), the same configuration achieves a peak accuracy of 86.8%, consistent with an upper bound of 85% classification accuracy based on two independent estimates of conformational noise.

1.2MNJul 6, 2025

Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models

Pranta Saha, Joyce Reimer, Brook Byrns et al.

The use of generative artificial intelligence (AI) models is becoming ubiquitous in many fields. Though progress continues to be made, general purpose large language AI models (LLM) show a tendency to deliver creative answers, often called "hallucinations", which have slowed their application in the medical and biomedical fields where accuracy is paramount. We propose that the design and use of much smaller, domain and even task-specific LM may be a more rational and appropriate use of this technology in biomedical research. In this work we apply a very small LM by today's standards to the specialized task of predicting regulatory interactions between molecular components to fill gaps in our current understanding of intracellular pathways. Toward this we attempt to correctly posit known pathway-informed interactions recovered from manually curated pathway databases by selecting and using only the most informative examples as part of an active learning scheme. With this example we show that a small (~110 million parameters) LM based on a Bidirectional Encoder Representations from Transformers (BERT) architecture can propose molecular interactions relevant to tuberculosis persistence and transmission with over 80% accuracy using less than 25% of the ~520 regulatory relationships in question. Using information entropy as a metric for the iterative selection of new tuning examples, we also find that increased accuracy is driven by favoring the use of the incorrectly assigned statements with the highest certainty (lowest entropy). In contrast, the concurrent use of correct but least certain examples contributed little and may have even been detrimental to the learning rate.

1.2QUANT-PHNov 29, 2024Code

A Graph-Based Classical and Quantum Approach to Deterministic L-System Inference

Ali Lotfi, Ian McQuillan, Steven Rayan

L-systems can be made to model and create simulations of many biological processes, such as plant development. Finding an L-system for a given process is typically solved by hand, by experts, in a massively time-consuming process. It would be significant if this could be done automatically from data, such as from sequences of images. In this paper, we are interested in inferring a particular type of L-system, deterministic context-free L-system (D0L-system) from a sequence of strings. We introduce the characteristic graph of a sequence of strings, which we then utilize to translate our problem (inferring D0L-systems) in polynomial time into the maximum independent set problem (MIS) and the SAT problem. After that, we offer a classical exact algorithm and an approximate quantum algorithm for the problem.