Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition
This work provides a novel, interpretable framework for evaluating RNA designability, which is important for RNA inverse folding and synthetic biology applications.
The authors introduce a theory of ensemble approximation and probability decomposition to bound folding probabilities of RNA structures, enabling the assessment of RNA designability beyond minimum free energy criteria. Their linear-time algorithm produces tighter probability bounds than prior methods on ArchiveII and Eterna100 benchmarks.
Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at https://github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.