LGAINEBMOct 27, 2025

Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

arXiv:2510.23379v1h-index: 18
Originality Incremental advance
AI Analysis

This addresses the challenge of generating feasible data in domains like drug design, offering a hybrid approach that is incremental in combining existing symbolic and neural methods.

The paper tackles the problem of generating data that meets formal correctness criteria by introducing Symbolic Neural Generators (SNGs), which combine symbolic learning with neural reasoning, and applies this to drug design, where on exploratory problems, generated molecules show binding affinities comparable to leading clinical candidates.

We investigate a relatively underexplored class of hybrid neurosymbolic models integrating symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In \textit{Symbolic Neural Generators} (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a triple $(H, X, W)$, where $H$ is a symbolic description of feasible instances constructed from data, $X$ a set of generated new instances that satisfy the description, and $W$ an associated weight. We introduce a semantics for such systems, based on the construction of appropriate \textit{base} and \textit{fibre} partially-ordered sets combined into an overall partial order, and outline a probabilistic extension relevant to practical applications. In this extension, SNGs result from searching over a weighted partial ordering. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes