89.0ITMar 26
Information-Theoretic Limits of Node Localization under Hybrid Graph Positional EncodingsZimo Yan, Zheng Xie, Chang Liu et al.
Positional encoding has become a standard component in graph learning, especially for graph Transformers and other models that must distinguish structurally similar nodes, yet its fundamental identifiability remains poorly understood. In this work, we study node localization under a hybrid positional encoding that combines anchor-distance profiles with quantized low-frequency spectral features. We cast localization as an observation-map problem whose difficulty is controlled by the number of distinct codes induced by the encoding and establish an information-theoretic converse identifying an impossibility regime jointly governed by the anchor number, spectral dimension, and quantization level. Experiments further support this picture: on random $3$-regular graphs, the empirical crossover is well organized by the predicted scaling, while on two real-world DDI graphs identifiability is strongly graph-dependent, with DrugBank remaining highly redundant under the tested encodings and the Decagon-derived graph becoming nearly injective under sufficiently rich spectral information. Overall, these results suggest that positional encoding should be understood not merely as a heuristic architectural component, but as a graph-dependent structural resolution mechanism.
ITJan 8
Bridging Distance and Spectral Positional Encodings via Anchor-Based Diffusion Geometry ApproximationZimo Yan, Zheng Xie, Runfan Duan et al.
Molecular graph learning benefits from positional signals that capture both local neighborhoods and global topology. Two widely used families are spectral encodings derived from Laplacian or diffusion operators and anchor-based distance encodings built from shortest-path information, yet their precise relationship is poorly understood. We interpret distance encodings as a low-rank surrogate of diffusion geometry and derive an explicit trilateration map that reconstructs truncated diffusion coordinates from transformed anchor distances and anchor spectral positions, with pointwise and Frobenius-gap guarantees on random regular graphs. On DrugBank molecular graphs using a shared GNP-based DDI prediction backbone, a distance-driven Nyström scheme closely recovers diffusion geometry, and both Laplacian and distance encodings substantially outperform a no-encoding baseline.
LGNov 24, 2025
Resolving Node Identifiability in Graph Neural Processes via Laplacian Spectral EncodingsZimo Yan, Zheng Xie, Chang Liu et al.
Message passing graph neural networks are widely used for learning on graphs, yet their expressive power is limited by the one-dimensional Weisfeiler-Lehman test and can fail to distinguish structurally different nodes. We provide rigorous theory for a Laplacian positional encoding that is invariant to eigenvector sign flips and to basis rotations within eigenspaces. We prove that this encoding yields node identifiability from a constant number of observations and establishes a sample-complexity separation from architectures constrained by the Weisfeiler-Lehman test. The analysis combines a monotone link between shortest-path and diffusion distance, spectral trilateration with a constant set of anchors, and quantitative spectral injectivity with logarithmic embedding size. As an instantiation, pairing this encoding with a neural-process style decoder yields significant gains on a drug-drug interaction task on chemical graphs, improving both the area under the ROC curve and the F1 score and demonstrating the practical benefits of resolving theoretical expressiveness limitations with principled positional information.
LGSep 18, 2025
A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions PredictionZimo Yan, Jie Zhang, Zheng Xie et al.
Accurate prediction of drug-drug interactions (DDI) is crucial for medication safety and effective drug development. However, existing methods often struggle to capture structural information across different scales, from local functional groups to global molecular topology, and typically lack mechanisms to quantify prediction confidence. To address these limitations, we propose MPNP-DDI, a novel Multi-scale Graph Neural Process framework. The core of MPNP-DDI is a unique message-passing scheme that, by being iteratively applied, learns a hierarchy of graph representations at multiple scales. Crucially, a cross-drug co-attention mechanism then dynamically fuses these multi-scale representations to generate context-aware embeddings for interacting drug pairs, while an integrated neural process module provides principled uncertainty estimation. Extensive experiments demonstrate that MPNP-DDI significantly outperforms state-of-the-art baselines on benchmark datasets. By providing accurate, generalizable, and uncertainty-aware predictions built upon multi-scale structural features, MPNP-DDI represents a powerful computational tool for pharmacovigilance, polypharmacy risk assessment, and precision medicine.
LGApr 22, 2025
MetaMolGen: A Neural Graph Motif Generation Model for De Novo Molecular DesignZimo Yan, Jie Zhang, Zheng Xie et al.
Molecular generation plays an important role in drug discovery and materials science, especially in data-scarce scenarios where traditional generative models often struggle to achieve satisfactory conditional generalization. To address this challenge, we propose MetaMolGen, a first-order meta-learning-based molecular generator designed for few-shot and property-conditioned molecular generation. MetaMolGen standardizes the distribution of graph motifs by mapping them to a normalized latent space, and employs a lightweight autoregressive sequence model to generate SMILES sequences that faithfully reflect the underlying molecular structure. In addition, it supports conditional generation of molecules with target properties through a learnable property projector integrated into the generative process.Experimental results demonstrate that MetaMolGen consistently generates valid and diverse SMILES sequences under low-data regimes, outperforming conventional baselines. This highlights its advantage in fast adaptation and efficient conditional generation for practical molecular design.