BMDec 11, 2024Code
Sampling-based Continuous Optimization with Coupled Variables for RNA DesignWei Yu Tang, Ning Dai, Tianshuo Zhou et al.
The task of RNA design given a target structure aims to find a sequence that can fold into that structure. It is a computationally hard problem where some version(s) have been proven to be NP-hard. As a result, heuristic methods such as local search have been popular for this task, but by only exploring a fixed number of candidates. They can not keep up with the exponential growth of the design space, and often perform poorly on longer and harder-to-design structures. We instead formulate these discrete problems as continuous optimization, which starts with a distribution over all possible candidate sequences, and uses gradient descent to improve the expectation of an objective function. We define novel distributions based on coupled variables to rule out invalid sequences given the target structure and to model the correlation between nucleotides. To make it universally applicable to any objective function, we use sampling to approximate the expected objective function, to estimate the gradient, and to select the final candidate. Compared to the state-of-the-art methods, our work consistently outperforms them in key metrics such as Boltzmann probability, ensemble defect, and energy gap, especially on long and hard-to-design puzzles in the Eterna100 benchmark. Our code is available at: http://github.com/weiyutang1010/ncrna_design.
BMDec 29, 2023
Messenger RNA Design via Expected Partition Function and Continuous OptimizationNing Dai, Wei Yu Tang, Tianshuo Zhou et al.
The tasks of designing RNAs are discrete optimization problems, and several versions of these problems are NP-hard. As an alternative to commonly used local search methods, we formulate these problems as continuous optimization and develop a general framework for this optimization based on a generalization of classical partition function which we call "expected partition function". The basic idea is to start with a distribution over all possible candidate sequences, and extend the objective function from a sequence to a distribution. We then use gradient descent-based optimization methods to improve the extended objective function, and the distribution will gradually shrink towards a one-hot sequence (i.e., a single sequence). As a case study, we consider the important problem of mRNA design with wide applications in vaccines and therapeutics. While the recent work of LinearDesign can efficiently optimize mRNAs for minimum free energy (MFE), optimizing for ensemble free energy is much harder and likely intractable. Our approach can consistently improve over the LinearDesign solution in terms of ensemble free energy, with bigger improvements on longer sequences.