PAIR-Former: Budgeted Relational MIL for miRNA Target Prediction
This work addresses a domain-specific challenge in computational biology by improving efficiency in miRNA target prediction, though it is incremental as it builds on existing multi-instance learning and transformer frameworks.
The paper tackles the problem of predicting miRNA-mRNA targeting, which involves selecting a limited number of candidate sites under a compute budget, and proposes PAIR-Former, a method that outperforms baselines at a practical budget of 64 sites while offering a controllable trade-off between accuracy and compute.
Functional miRNA--mRNA targeting is a large-bag prediction problem: each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. We formalize this regime as \emph{Budgeted Relational Multi-Instance Learning (BR-MIL)}, where at most $K$ instances per bag may receive expensive encoding and relational processing under a hard compute budget. We propose \textbf{PAIR-Former} (Pool-Aware Instance-Relational Transformer), a BR-MIL pipeline that performs a cheap full-pool scan, selects up to $K$ diverse CTSs on CPU, and applies a permutation-invariant Set Transformer aggregator on the selected tokens. On miRAW, PAIR-Former outperforms strong pooling baselines at a practical operating budget ($K^\star{=}64$) while providing a controllable accuracy--compute trade-off as $K$ varies. We further provide theory linking budgeted selection to (i) approximation error decreasing with $K$ and (ii) generalization terms governed by $K$ in the expensive relational component.