Aditya Nandy

CHEM-PH
h-index26
15papers
453citations
Novelty41%
AI Score40

15 Papers

MTRL-SCIMay 6, 2022
Putting Density Functional Theory to the Test in Machine-Learning-Accelerated Materials Discovery

Chenru Duan, Fang Liu, Aditya Nandy et al.

Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.

MTRL-SCIOct 25, 2022
A Database of Ultrastable MOFs Reassembled from Stable Fragments with Machine Learning Models

Aditya Nandy, Shuwen Yue, Changhwan Oh et al.

High-throughput screening of large hypothetical databases of metal-organic frameworks (MOFs) can uncover new materials, but their stability in real-world applications is often unknown. We leverage community knowledge and machine learning (ML) models to identify MOFs that are thermally stable and stable upon activation. We separate these MOFs into their building blocks and recombine them to make a new hypothetical MOF database of over 50,000 structures that samples orders of magnitude more connectivity nets and inorganic building blocks than prior databases. This database shows an order of magnitude enrichment of ultrastable MOF structures that are stable upon activation and more than one standard deviation more thermally stable than the average experimentally characterized MOF. For the nearly 10,000 ultrastable MOFs, we compute bulk elastic moduli to confirm these materials have good mechanical stability, and we report methane deliverable capacities. Our work identifies privileged metal nodes in ultrastable MOFs that optimize gas storage and mechanical stability simultaneously.

CHEM-PHMar 2, 2022
Machine learning models predict calculation outcomes with the transferability necessary for computational catalysis

Chenru Duan, Aditya Nandy, Husain Adamji et al.

Virtual high throughput screening (VHTS) and machine learning (ML) have greatly accelerated the design of single-site transition-metal catalysts. VHTS of catalysts, however, is often accompanied with high calculation failure rate and wasted computational resources due to the difficulty of simultaneously converging all mechanistically relevant reactive intermediates to expected geometries and electronic states. We demonstrate a dynamic classifier approach, i.e., a convolutional neural network that monitors geometry optimization on the fly, and exploit its good performance and transferability for catalyst design. We show that the dynamic classifier performs well on all reactive intermediates in the representative catalytic cycle of the radical rebound mechanism for methane-to-methanol despite being trained on only one reactive intermediate. The dynamic classifier also generalizes to chemically distinct intermediates and metal centers absent from the training data without loss of accuracy or model confidence. We rationalize this superior model transferability to the use of on-the-fly electronic structure and geometric information generated from density functional theory calculations and the convolutional layer in the dynamic classifier. Combined with model uncertainty quantification, the dynamic classifier saves more than half of the computational resources that would have been wasted on unsuccessful calculations for all reactive intermediates being considered.

CHEM-PHAug 10, 2022
Active Learning Exploration of Transition Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores

Chenru Duan, Aditya Nandy, Gianmarco Terrones et al.

Transition metal chromophores with earth-abundant transition metals are an important design target for their applications in lighting and non-toxic bioimaging, but their design is challenged by the scarcity of complexes that simultaneously have optimal target absorption energies in the visible region as well as well-defined ground states. Machine learning (ML) accelerated discovery could overcome such challenges by enabling screening of a larger space, but is limited by the fidelity of the data used in ML model training, which is typically from a single approximate density functional. To address this limitation, we search for consensus in predictions among 23 density functional approximations across multiple rungs of Jacobs ladder. To accelerate the discovery of complexes with absorption energies in the visible region while minimizing MR character, we use 2D efficient global optimization to sample candidate low-spin chromophores from multi-million complex spaces. Despite the scarcity (i.e., approx. 0.01\%) of potential chromophores in this large chemical space, we identify candidates with high likelihood (i.e., > 10\%) of computational validation as the ML models improve during active learning, representing a 1,000-fold acceleration in discovery. Absorption spectra of promising chromophores from time-dependent density functional theory verify that 2/3 of candidates have the desired excited state properties. The observation that constituent ligands from our leads have demonstrated interesting optical properties in the literature exemplifies the effectiveness of our construction of a realistic design space and active learning approach.

CHEM-PHJul 21, 2022
A Transferable Recommender Approach for Selecting the Best Density Functional Approximations in Chemical Discovery

Chenru Duan, Aditya Nandy, Ralf Meyer et al.

Approximate density functional theory (DFT) has become indispensable owing to its cost-accuracy trade-off in comparison to more computationally demanding but accurate correlated wavefunction theory. To date, however, no single density functional approximation (DFA) with universal accuracy has been identified, leading to uncertainty in the quality of data generated from DFT. With electron density fitting and transfer learning, we build a DFA recommender that selects the DFA with the lowest expected error with respect to gold standard but cost-prohibitive coupled cluster theory in a system-specific manner. We demonstrate this recommender approach on vertical spin-splitting energy evaluation for challenging transition metal complexes. Our recommender predicts top-performing DFAs and yields excellent accuracy (ca. 2 kcal/mol) for chemical discovery, outperforming both individual transfer learning models and the single best functional in a set of 48 DFAs. We demonstrate the transferability of the DFA recommender to experimentally synthesized compounds with distinct chemistry.

AIDec 17, 2025
Evaluating Large Language Models in Scientific Discovery

Zhangde Song, Jieyu Lu, Yuanqi Du et al.

Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain experts define research projects of genuine interest and decompose them into modular research scenarios from which vetted questions are sampled. The framework assesses models at two levels: (i) question-level accuracy on scenario-tied items and (ii) project-level performance, where models must propose testable hypotheses, design simulations or experiments, and interpret results. Applying this two-phase scientific discovery evaluation (SDE) framework to state-of-the-art LLMs reveals a consistent performance gap relative to general science benchmarks, diminishing return of scaling up model sizes and reasoning, and systematic weaknesses shared across top-tier models from different providers. Large performance variation in research scenarios leads to changing choices of the best performing model on scientific discovery projects evaluated, suggesting all current LLMs are distant to general scientific "superintelligence". Nevertheless, LLMs already demonstrate promise in a great variety of scientific discovery projects, including cases where constituent scenario scores are low, highlighting the role of guided exploration and serendipity in discovery. This SDE framework offers a reproducible benchmark for discovery-relevant evaluation of LLMs and charts practical paths to advance their development toward scientific discovery.

CHEM-PHSep 18, 2022
Low-cost machine learning approach to the prediction of transition metal phosphor excited state properties

Gianmarco Terrones, Chenru Duan, Aditya Nandy et al.

Photoactive iridium complexes are of broad interest due to their applications ranging from lighting to photocatalysis. However, the excited state property prediction of these complexes challenges ab initio methods such as time-dependent density functional theory (TDDFT) both from an accuracy and a computational cost perspective, complicating high throughput virtual screening (HTVS). We instead leverage low-cost machine learning (ML) models to predict the excited state properties of photoactive iridium complexes. We use experimental data of 1,380 iridium complexes to train and evaluate the ML models and identify the best-performing and most transferable models to be those trained on electronic structure features from low-cost density functional theory tight binding calculations. Using these models, we predict the three excited state properties considered, mean emission energy of phosphorescence, excited state lifetime, and emission spectral integral, with accuracy competitive with or superseding TDDFT. We conduct feature importance analysis to identify which iridium complex attributes govern excited state properties and we validate these trends with explicit examples. As a demonstration of how our ML models can be used for HTVS and the acceleration of chemical discovery, we curate a set of novel hypothetical iridium complexes and identify promising ligands for the design of new phosphors.

LGNov 11, 2024
Score-based generative diffusion with "active" correlated noise sources

Alexandra Lamtyugina, Agnish Kumar Behera, Aditya Nandy et al.

Diffusion models exhibit robust generative properties by approximating the underlying distribution of a dataset and synthesizing data by sampling from the approximated distribution. In this work, we explore how the generative performance may be be modulated if noise sources with temporal correlations -- akin to those used in the field of active matter -- are used for the destruction of the data in the forward process. Our numerical and analytical experiments suggest that the corresponding reverse process may exhibit improved generative properties.

MTRL-SCIAug 15, 2025
The Rise of Generative AI for Metal-Organic Framework Design and Synthesis

Chenru Duan, Aditya Nandy, Shyam Chand Pal et al.

Advances in generative artificial intelligence are transforming how metal-organic frameworks (MOFs) are designed and discovered. This Perspective introduces the shift from laborious enumeration of MOF candidates to generative approaches that can autonomously propose and synthesize in the laboratory new porous reticular structures on demand. We outline the progress of employing deep learning models, such as variational autoencoders, diffusion models, and large language model-based agents, that are fueled by the growing amount of available data from the MOF community and suggest novel crystalline materials designs. These generative tools can be combined with high-throughput computational screening and even automated experiments to form accelerated, closed-loop discovery pipelines. The result is a new paradigm for reticular chemistry in which AI algorithms more efficiently direct the search for high-performance MOF materials for clean air and energy applications. Finally, we highlight remaining challenges such as synthetic feasibility, dataset diversity, and the need for further integration of domain knowledge.

CHEM-PHMay 13, 2025
Building-Block Aware Generative Modeling for 3D Crystals of Metal Organic Frameworks

Chenru Duan, Aditya Nandy, Sizhan Liu et al.

Metal-organic frameworks (MOFs) marry inorganic nodes, organic edges, and topological nets into programmable porous crystals, yet their astronomical design space defies brute-force synthesis. Generative modeling holds ultimate promise, but existing models either recycle known building blocks or are restricted to small unit cells. We introduce Building-Block-Aware MOF Diffusion (BBA MOF Diffusion), an SE(3)-equivariant diffusion model that learns 3D all-atom representations of individual building blocks, encoding crystallographic topological nets explicitly. Trained on the CoRE-MOF database, BBA MOF Diffusion readily samples MOFs with unit cells containing 1000 atoms with great geometric validity, novelty, and diversity mirroring experimental databases. Its native building-block representation produces unprecedented metal nodes and organic edges, expanding accessible chemical space by orders of magnitude. One high-scoring [Zn(1,4-TDC)(EtOH)2] MOF predicted by the model was synthesized, where powder X-ray diffraction, thermogravimetric analysis, and N2 sorption confirm its structural fidelity. BBA-Diff thus furnishes a practical pathway to synthesizable and high-performing MOFs.

CHEM-PHJan 11, 2022
Two Wrongs Can Make a Right: A Transfer Learning Approach for Chemical Discovery with Chemical Accuracy

Chenru Duan, Daniel B. K. Chu, Aditya Nandy et al.

Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high throughput screening (VHTS). Nevertheless, most VHTS is carried out with approximate density functional theory (DFT) using a single functional. Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates MR effect on chemical property prediction is not well established. We evaluate MR diagnostics of over 10,000 transition metal complexes (TMCs) and compare to those in organic molecules. We reveal that only some MR diagnostics are transferable across these materials spaces. By studying the influence of MR character on chemical properties (i.e., MR effect) that involves multiple potential energy surfaces (i.e., adiabatic spin splitting, $ΔE_\mathrm{H-L}$, and ionization potential, IP), we observe that cancellation in MR effect outweighs accumulation. Differences in MR character are more important than the total degree of MR character in predicting MR effect in property prediction. Motivated by this observation, we build transfer learning models to directly predict CCSD(T)-level adiabatic $ΔE_\mathrm{H-L}$ and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving chemical accuracy (i.e., 1 kcal/mol) for robust VHTS.

CHEM-PHNov 2, 2021
Audacity of huge: overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery

Aditya Nandy, Chenru Duan, Heather J. Kulik

Machine learning (ML)-accelerated discovery requires large amounts of high-fidelity data to reveal predictive structure-property relationships. For many properties of interest in materials discovery, the challenging nature and high cost of data generation has resulted in a data landscape that is both scarcely populated and of dubious quality. Data-driven techniques starting to overcome these limitations include the use of consensus across functionals in density functional theory, the development of new functionals or accelerated electronic structure theories, and the detection of where computationally demanding methods are most necessary. When properties cannot be reliably simulated, large experimental data sets can be used to train ML models. In the absence of manual curation, increasingly sophisticated natural language processing and automated image analysis are making it possible to learn structure-property relationships from the literature. Models trained on these data sets will improve as they incorporate community feedback.

MTRL-SCIJul 29, 2021
Deciphering Cryptic Behavior in Bimetallic Transition Metal Complexes with Machine Learning

Michael G. Taylor, Aditya Nandy, Connie C. Lu et al.

The rational tailoring of transition metal complexes is necessary to address outstanding challenges in energy utilization and storage. Heterobimetallic transition metal complexes that exhibit metal-metal bonding in stacked "double decker" ligand structures are an emerging, attractive platform for catalysis, but their properties are challenging to predict prior to laborious synthetic efforts. We demonstrate an alternative, data-driven approach to uncovering structure-property relationships for rational bimetallic complex design. We tailor graph-based representations of the metal-local environment for these heterobimetallic complexes for use in training of multiple linear regression and kernel ridge regression (KRR) models. Focusing on oxidation potentials, we obtain a set of 28 experimentally characterized complexes to develop a multiple linear regression model. On this training set, we achieve good accuracy (mean absolute error, MAE, of 0.25 V) and preserve transferability to unseen experimental data with a new ligand structure. We trained a KRR model on a subset of 330 structurally characterized heterobimetallics to predict the degree of metal-metal bonding. This KRR model predicts relative metal-metal bond lengths in the test set to within 5%, and analysis of key features reveals the fundamental atomic contributions (e.g., the valence electron configuration) that most strongly influence the behavior of complexes. Our work provides guidance for rational bimetallic design, suggesting that properties including the formal shortness ratio should be transferable from one period to another.

MTRL-SCIJun 24, 2021
Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal-Organic Frameworks

Aditya Nandy, Chenru Duan, Heather J. Kulik

Although the tailored metal active sites and porous architectures of MOFs hold great promise for engineering challenges ranging from gas separations to catalysis, a lack of understanding of how to improve their stability limits their use in practice. To overcome this limitation, we extract thousands of published reports of the key aspects of MOF stability necessary for their practical application: the ability to withstand high temperatures without degrading and the capacity to be activated by removal of solvent molecules. From nearly 4,000 manuscripts, we use natural language processing and automated image analysis to obtain over 2,000 solvent-removal stability measures and 3,000 thermal degradation temperatures. We analyze the relationships between stability properties and the chemical and geometric structures in this set to identify limits of prior heuristics derived from smaller sets of MOFs. By training predictive machine learning (ML, i.e., Gaussian process and artificial neural network) models to encode the structure-property relationships with graph- and pore-structure-based representations, we are able to make predictions of stability orders of magnitude faster than conventional physics-based modeling or experiment. Interpretation of important features in ML models provides insights that we use to identify strategies to engineer increased stability into typically unstable 3d-containing MOFs that are frequently targeted for catalytic applications. We expect our approach to accelerate the time to discovery of stable, practical MOF materials for a wide range of applications.

CHEM-PHJun 20, 2021
Representations and Strategies for Transferable Machine Learning Models in Chemical Discovery

Daniel R. Harper, Aditya Nandy, Naveen Arunachalam et al.

Strategies for machine-learning(ML)-accelerated discovery that are general across materials composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets like open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (ca. 1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the periodic table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the effective nuclear charge alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data is limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the periodic table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the periodic table, a property we expect to be broadly useful for other materials domains.