GNLGJul 16, 2025

RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery

arXiv:2507.11950v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses a critical problem in bioinformatics and biology for researchers studying microbial genomes, enabling discovery of ncRNAs essential for regulation, stress response, and metabolism, though it appears incremental as it builds on existing methods for a known bottleneck.

The paper tackles the challenge of identifying non-coding RNAs (ncRNAs) directly from genomic sequence without requiring transcriptomics data, presenting RNAMunin, a deep machine learning model that achieves this at scale, with the model being small (approximately 1M parameters) and fast, trained on 60 Gbp of long read metagenomes.

Functional annotation of microbial genomes is often biased toward protein-coding genes, leaving a vast, unexplored landscape of non-coding RNAs (ncRNAs) that are critical for regulating bacterial and archaeal physiology, stress response and metabolism. Identifying ncRNAs directly from genomic sequence is a paramount challenge in bioinformatics and biology, essential for understanding the complete regulatory potential of an organism. This paper presents RNAMunin, a machine learning (ML) model that is capable of finding ncRNAs using genomic sequence alone. It is also computationally viable for large sequence datasets such as long read metagenomic assemblies with contigs totaling multiple Gbp. RNAMunin is trained on Rfam sequences extracted from approximately 60 Gbp of long read metagenomes from 16 San Francisco Estuary samples. We know of no other model that can detect ncRNAs based solely on genomic sequence at this scale. Since RNAMunin only requires genomic sequence as input, we do not need for an ncRNA to be transcribed to find it, i.e., we do not need transcriptomics data. We wrote this manuscript in a narrative style in order to best convey how RNAMunin was developed and how it works in detail. Unlike almost all current ML models, at approximately 1M parameters, RNAMunin is very small and very fast.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes