Boltzmann Graph Ensemble Embeddings for Aptamer Libraries
This work addresses the challenge of identifying aptamers with high ligand affinity in biochemistry, particularly for low-abundance candidates, though it appears incremental as it builds on existing graph-based methods.
The paper tackled the problem of predicting aptamer-ligand affinity from SELEX datasets, where experimental biases obscure true binding strengths, by introducing a Boltzmann-weighted ensemble embedding for molecules. The result showed that this embedding enables robust community detection and subgraph-level explanations for affinity, even with biased observations.
Machine-learning methods in biochemistry commonly represent molecules as graphs of pairwise intermolecular interactions for property and structure predictions. Most methods operate on a single graph, typically the minimal free energy (MFE) structure, for low-energy ensembles (conformations) representative of structures at thermodynamic equilibrium. We introduce a thermodynamically parameterized exponential-family random graph (ERGM) embedding that models molecules as Boltzmann-weighted ensembles of interaction graphs. We evaluate this embedding on SELEX datasets, where experimental biases (e.g., PCR amplification or sequencing noise) can obscure true aptamer-ligand affinity, producing anomalous candidates whose observed abundance diverges from their actual binding strength. We show that the proposed embedding enables robust community detection and subgraph-level explanations for aptamer ligand affinity, even in the presence of biased observations. This approach may be used to identify low-abundance aptamer candidates for further experimental evaluation.