MicroFuse: Protein-to-Genome Expert Fusion for Microbial Operon Reasoning

arXiv:2605.088150.8
Predicted impact top 100% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of operon prediction for microbiologists, offering a method that handles conflicting signals between protein identity and genomic context.

MicroFuse integrates protein-scale and genome-context representations via a Mixture-of-Experts module to predict microbial operon co-membership, achieving the strongest AUROC, AUPRC, mAP, and mAR on the OG-Operon100K benchmark, with largest gains in biologically ambiguous cases.

Predicting microbial operon co-membership requires integrating two complementary biological signals: protein-scale molecular identity and genome-context organization. While recent biological foundation models provide powerful representations of each view independently, naive concatenation of these modalities ignores a key biological property -- protein identity and genomic context may agree when adjacent genes form a coherent functional module, or conflict when sequence similarity is misleading but genomic layout indicates independent regulation. We present MicroFuse, a protein-to-genome expert fusion framework that integrates structure-aware protein representations from ProstT5 with genome-context representations from Bacformer through a four-expert Mixture-of-Experts module (protein, genome-context, agreement, and conflict experts) with a learned soft router. Training combines binary cross-entropy with symmetric cross-modal InfoNCE alignment and disagreement-weighted supervised contrastive shaping. We further construct OG-Operon100K, a 100,000-pair scaffold-level benchmark from the OMG metagenomic corpus with biologically grounded positive and negative criteria. On OG-Operon100K, MicroFuse achieves the strongest AUROC, AUPRC, mAP, and mAR among ProstT5-only, Bacformer-only, and Concat MLP baselines. Ablations identify cross-modal contrastive alignment as the dominant component, and a hard sequence-conflict subset reveals MicroFuse's largest gains precisely in biologically ambiguous cases where protein identity alone is misleading.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes