Manasa Kaniselvan

CHEM-PH
h-index16
5papers
14citations
Novelty45%
AI Score47

5 Papers

CLNov 8, 2025
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs

Renfei Zhang, Manasa Kaniselvan, Niloofar Mireshghallah

Reinforcement learning (RL) is often credited with improving language model reasoning and generalization at the expense of degrading memorized knowledge. We challenge this narrative by observing that RL-enhanced models consistently outperform their base and supervised fine-tuned (SFT) counterparts on pure knowledge recall tasks, particularly those requiring traversal of hierarchical, structured knowledge (e.g., medical codes). We hypothesize these gains stem not from newly acquired data, but from improved procedural skills in navigating and searching existing knowledge hierarchies within the model parameters. To support this hypothesis, we show that structured prompting, which explicitly guides SFTed models through hierarchical traversal, recovers most of the performance gap (reducing 24pp to 7pp on MedConceptsQA for DeepSeek-V3/R1). We further find that while prompting improves final-answer accuracy, RL-enhanced models retain superior ability to recall correct procedural paths on deep-retrieval tasks. Finally our layer-wise internal activation analysis reveals that while factual representations (e.g., activations for the statement "code 57.95 refers to urinary infection") maintain high cosine similarity between SFT and RL models, query representations (e.g., "what is code 57.95") diverge noticeably, indicating that RL primarily transforms how models traverse knowledge rather than the knowledge representation itself.

MTRL-SCIFeb 3
Acceleration of Atomistic NEGF: Algorithms, Parallelization, and Machine Learning

Mathieu Luisier, Nicolas Vetsch, Alexander Maeder et al.

The Non-equilibrium Green's function (NEGF) formalism is a particularly powerful method to simulate the quantum transport properties of nanoscale devices such as transistors, photo-diodes, or memory cells, in the ballistic limit of transport or in the presence of various scattering sources such as electronphonon, electron-photon, or even electron-electron interactions. The inclusion of all these mechanisms has been first demonstrated in small systems, composed of a few atoms, before being scaled up to larger structures made of thousands of atoms. Also, the accuracy of the models has kept improving, from empirical to fully ab-initio ones, e.g., density functional theory (DFT). This paper summarizes key (algorithmic) achievements that have allowed us to bring DFT+NEGF simulations closer to the dimensions and functionality of realistic systems. The possibility of leveraging graph neural networks and machine learning to speed up ab-initio device simulations is discussed as well.

CHEM-PHOct 13, 2025
Enhancing Diffusion-Based Sampling with Molecular Collective Variables

Juno Nam, Bálint Máté, Artur P. Toshev et al.

Diffusion-based samplers learn to sample complex, high-dimensional distributions using energies or log densities alone, without training data. Yet, they remain impractical for molecular sampling because they are often slower than molecular dynamics and miss thermodynamically relevant modes. Inspired by enhanced sampling, we encourage exploration by introducing a sequential bias along bespoke, information-rich, low-dimensional projections of atomic coordinates known as collective variables (CVs). We introduce a repulsive potential centered on the CVs from recent samples, which pushes future samples towards novel CV regions and effectively increases the temperature in the projected space. Our resulting method improves efficiency, mode discovery, enables the estimation of free energy differences, and retains independent sampling from the approximate Boltzmann distribution via reweighting by the bias. On standard peptide conformational sampling benchmarks, the method recovers diverse conformational states and accurate free energy profiles. We are the first to demonstrate reactive sampling using a diffusion-based sampler, capturing bond breaking and formation with universal interatomic potentials at near-first-principles accuracy. The approach resolves reactive energy landscapes at a fraction of the wall-clock time of standard sampling methods, advancing diffusion-based sampling towards practical use in molecular sciences.

CHEM-PHSep 30, 2025
Learning from the electronic structure of molecules across the periodic table

Manasa Kaniselvan, Benjamin Kurt Miller, Meng Gao et al.

Machine-Learned Interatomic Potentials (MLIPs) require vast amounts of atomic structure data to learn forces and energies, and their performance continues to improve with training set size. Meanwhile, the even greater quantities of accompanying data in the Hamiltonian matrix H behind these datasets has so far gone unused for this purpose. Here, we provide a recipe for integrating the orbital interaction data within H towards training pipelines for atomic-level properties. We first introduce HELM ("Hamiltonian-trained Electronic-structure Learning for Molecules"), a state-of-the-art Hamiltonian prediction model which bridges the gap between Hamiltonian prediction and universal MLIPs by scaling to H of structures with 100+ atoms, high elemental diversity, and large basis sets including diffuse functions. To accompany HELM, we release a curated Hamiltonian matrix dataset, 'OMol_CSH_58k', with unprecedented elemental diversity (58 elements), molecular size (up to 150 atoms), and basis set (def2-TZVPD). Finally, we introduce 'Hamiltonian pretraining' as a method to extract meaningful descriptors of atomic environments even from a limited number atomic structures, and repurpose this shared embedding space to improve performance on energy-prediction in low-data regimes. Our results highlight the use of electronic interactions as a rich and transferable data source for representing chemical space.

LGJul 4, 2025
Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction

Manasa Kaniselvan, Alexander Maeder, Chen Hao Xia et al.

Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales, enabling investigation of the electronic properties of materials with extended defects, interfaces, or exhibiting disordered phases. However, as interactions between atomic orbitals typically extend over 10+ angstroms, the graph representations required for this task tend to be densely connected, and the memory requirements to perform training and inference on these large structures can exceed the limits of modern GPUs. Here we present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph to reduce the number of embedding exchanges between GPUs. Our implementation shows strong scaling up to 128 GPUs, and weak scaling up to 512 GPUs with 87% parallel efficiency for structures with 3,000 to 190,000 atoms on the Alps supercomputer.