80.0AIMay 18Code
SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational ScienceNithin Somasekharan, Youssef Hassan, Shiyao Lin et al.
Large Language Models (LLMs) are increasingly deployed as scientific AI as- sistants, and a growing body of benchmarks evaluates their capabilities across knowledge retrieval, reasoning, code generation, and tool use. These evaluations, however, typically assume the scientific problem is already well-posed, whereas practical scientific assistance often begins with an ill-posed user request that must be refined through dialogue before any computation, analysis, or experiment can be carried out reliably. We introduce SCICONVBENCH, a benchmark for multi- turn clarification in scientific task formulation across four computational science problem domains: fluid mechanics, solid mechanics, materials science, and par- tial differential equations (PDEs). SCICONVBENCH targets two complementary capabilities: eliciting missing information (disambiguation) and detecting and correcting erroneous requests containing internally contradictory information (in- consistency resolution). Our benchmark pairs a structured task ontology with a rubric-based evaluation framework, enabling systematic measurement of LLM per- formance across three dimensions: clarification behavior, conversational grounding, and final-specification fidelity. Current frontier models perform relatively well on inconsistency resolution, but even the best model resolves only 52.7% of the disambiguation cases in fluid mechanics. We further find that frontier LLMs fre- quently make silent assumptions and perform implicit specification repairs that are not grounded in the conversation with users. SCICONVBENCH establishes a foundation for evaluating the upstream conversational reasoning that a reliable computational science assistant requires. The code and data can be found at https://github.com/csml-rpi/SciConvBench.
MTRL-SCIJan 17, 2023
Outlier-Based Domain of Applicability Identification for Materials Property Prediction ModelsGihan Panapitiya, Emily Saldanha
Machine learning models have been widely applied for material property prediction. However, practical application of these models can be hindered by a lack of information about how well they will perform on previously unseen types of materials. Because machine learning model predictions depend on the quality of the available training data, different domains of the material feature space are predicted with different accuracy levels by such models. The ability to identify such domains enables the ability to find the confidence level of each prediction, to determine when and how the model should be employed depending on the prediction accuracy requirements of different tasks, and to improve the model for domains with high errors. In this work, we propose a method to find domains of applicability using a large feature space and also introduce analysis techniques to gain more insight into the detected domains and subdomains.
LGDec 20, 2022
Dynamic Molecular Graph-based Implementation for Biophysical Properties PredictionCarter Knutson, Gihan Panapitiya, Rohith Varikoti et al.
Neural Networks (GNNs) have revolutionized the molecular discovery to understand patterns and identify unknown features that can aid in predicting biophysical properties and protein-ligand interactions. However, current models typically rely on 2-dimensional molecular representations as input, and while utilization of 2\3- dimensional structural data has gained deserved traction in recent years as many of these models are still limited to static graph representations. We propose a novel approach based on the transformer model utilizing GNNs for characterizing dynamic features of protein-ligand interactions. Our message passing transformer pre-trains on a set of molecular dynamic data based off of physics-based simulations to learn coordinate construction and make binding probability and affinity predictions as a downstream task. Through extensive testing we compare our results with the existing models, our MDA-PLI model was able to outperform the molecular interaction prediction models with an RMSE of 1.2958. The geometric encodings enabled by our transformer architecture and the addition of time series data add a new dimensionality to this form of research.
AISep 30, 2025Code
AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical ExperimentationGihan Panapitiya, Emily Saldanha, Heather Job et al.
The automation of chemical research through self-driving laboratories (SDLs) promises to accelerate scientific discovery, yet the reliability and granular performance of the underlying AI agents remain critical, under-examined challenges. In this work, we introduce AutoLabs, a self-correcting, multi-agent architecture designed to autonomously translate natural-language instructions into executable protocols for a high-throughput liquid handler. The system engages users in dialogue, decomposes experimental goals into discrete tasks for specialized agents, performs tool-assisted stoichiometric calculations, and iteratively self-corrects its output before generating a hardware-ready file. We present a comprehensive evaluation framework featuring five benchmark experiments of increasing complexity, from simple sample preparation to multi-plate timed syntheses. Through a systematic ablation study of 20 agent configurations, we assess the impact of reasoning capacity, architectural design (single- vs. multi-agent), tool use, and self-correction mechanisms. Our results demonstrate that agent reasoning capacity is the most critical factor for success, reducing quantitative errors in chemical amounts (nRMSE) by over 85% in complex tasks. When combined with a multi-agent architecture and iterative self-correction, AutoLabs achieves near-expert procedural accuracy (F1-score > 0.89) on challenging multi-step syntheses. These findings establish a clear blueprint for developing robust and trustworthy AI partners for autonomous laboratories, highlighting the synergistic effects of modular design, advanced reasoning, and self-correction to ensure both performance and reliability in high-stakes scientific applications. Code: https://github.com/pnnl/autolabs
LGOct 16, 2024
FragNet: A Graph Neural Network for Molecular Property Prediction with Four Levels of InterpretabilityGihan Panapitiya, Peiyuan Gao, C Mark Maupin et al.
Molecular property prediction is essential in a variety of contemporary scientific fields, such as drug development and designing energy storage materials. Although there are many machine learning models available for this purpose, those that achieve high accuracy while also offering interpretability of predictions are uncommon. We present a graph neural network that not only matches the prediction accuracies of leading models but also provides insights on four levels of molecular substructures. This model helps identify which atoms, bonds, molecular fragments, and connections between fragments are significant for predicting a specific molecular property. Understanding the importance of connections between fragments is particularly valuable for molecules with substructures that do not connect through standard bonds. The model additionally can quantify the impact of specific fragments on the prediction, allowing the identification of fragments that may improve or degrade a property value. These interpretable features are essential for deriving scientific insights from the model's learned relationships between molecular structures and properties.
MTRL-SCIMay 26, 2021
Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular RepresentationsGihan Panapitiya, Michael Girard, Aaron Hollas et al.
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules. Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system (SMILES) strings, molecular graphs, and three-dimensional (3D) atomic coordinates using four different neural network architectures - fully connected neural networks (FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.