Raymundo Arróyave

LG
h-index12
6papers
72citations
Novelty53%
AI Score43

6 Papers

LGSep 22, 2024
Supply Risk-Aware Alloy Discovery and Design

Mrinalini Mulukutla, Robert Robinson, Danial Khatamsaz et al.

Materials design is a critical driver of innovation, yet overlooking the technological, economic, and environmental risks inherent in materials and their supply chains can lead to unsustainable and risk-prone solutions. To address this, we present a novel risk-aware design approach that integrates Supply-Chain Aware Design Strategies into the materials development process. This approach leverages existing language models and text analysis to develop a specialized model for predicting materials feedstock supply risk indices. To efficiently navigate the multi-objective, multi-constraint design space, we employ Batch Bayesian Optimization (BBO), enabling the identification of Pareto-optimal high entropy alloys (HEAs) that balance performance objectives with minimized supply risk. A case study using the MoNbTiVW system demonstrates the efficacy of our approach in four scenarios, highlighting the significant impact of incorporating supply risk into the design process. By optimizing for both performance and supply risk, we ensure that the developed alloys are not only high-performing but also sustainable and economically viable. This integrated approach represents a critical step towards a future where materials discovery and design seamlessly consider sustainability, supply chain dynamics, and comprehensive life cycle analysis.

LGJan 8
The Kernel Manifold: A Geometric Approach to Gaussian Process Model Selection

Md Shafiqul Islam, Shakti Prasad Padhy, Douglas Allaire et al.

Gaussian Process (GP) regression is a powerful nonparametric Bayesian framework, but its performance depends critically on the choice of covariance kernel. Selecting an appropriate kernel is therefore central to model quality, yet remains one of the most challenging and computationally expensive steps in probabilistic modeling. We present a Bayesian optimization framework built on kernel-of-kernels geometry, using expected divergence-based distances between GP priors to explore kernel space efficiently. A multidimensional scaling (MDS) embedding of this distance matrix maps a discrete kernel library into a continuous Euclidean manifold, enabling smooth BO. In this formulation, the input space comprises kernel compositions, the objective is the log marginal likelihood, and featurization is given by the MDS coordinates. When the divergence yields a valid metric, the embedding preserves geometry and produces a stable BO landscape. We demonstrate the approach on synthetic benchmarks, real-world time-series datasets, and an additive manufacturing case study predicting melt-pool geometry, achieving superior predictive accuracy and uncertainty calibration relative to baselines including Large Language Model (LLM)-guided search. This framework establishes a reusable probabilistic geometry for kernel search, with direct relevance to GP modeling and deep kernel learning.

LGJan 12
Simulated Annealing-based Candidate Optimization for Batch Acquisition Functions

Sk Md Ahnaf Akif Alvi, Raymundo Arróyave, Douglas Allaire

Bayesian Optimization with multi-objective acquisition functions such as q-Expected Hypervolume Improvement (qEHVI) requires efficient candidate optimization to maximize acquisition function values. Traditional approaches rely on continuous optimization methods like Sequential Least Squares Programming (SLSQP) for candidate selection. However, these gradient-based methods can become trapped in local optima, particularly in complex or high-dimensional objective landscapes. This paper presents a simulated annealing-based approach for candidate optimization in batch acquisition functions as an alternative to conventional continuous optimization methods. We evaluate our simulated annealing approach against SLSQP across four benchmark multi-objective optimization problems: ZDT1 (30D, 2 objectives), DTLZ2 (7D, 3 objectives), Kursawe (3D, 2 objectives), and Latent-Aware (4D, 2 objectives). Our results demonstrate that simulated annealing consistently achieves superior hypervolume performance compared to SLSQP in most test functions. The improvement is particularly pronounced for DTLZ2 and Latent-Aware problems, where simulated annealing reaches significantly higher hypervolume values and maintains better convergence characteristics. The histogram analysis of objective space coverage further reveals that simulated annealing explores more diverse and optimal regions of the Pareto front. These findings suggest that metaheuristic optimization approaches like simulated annealing can provide more robust and effective candidate optimization for multi-objective Bayesian optimization, offering a promising alternative to traditional gradient-based methods for batch acquisition function optimization.

MTRL-SCINov 22, 2024
Accelerating CALPHAD-based Phase Diagram Predictions in Complex Alloys Using Universal Machine Learning Potentials: Opportunities and Challenges

Siya Zhu, Raymundo Arróyave, Doğuhan Sarıtürk

Accurate phase diagram prediction is crucial for understanding alloy thermodynamics and advancing materials design. While traditional CALPHAD methods are robust, they are resource-intensive and limited by experimentally assessed data. This work explores the use of machine learning interatomic potentials (MLIPs) such as M3GNet, CHGNet, MACE, SevenNet, and ORB to significantly accelerate phase diagram calculations by using the Alloy Theoretic Automated Toolkit (ATAT) to map calculations of the energies and free energies of atomistic systems to CALPHAD-compatible thermodynamic descriptions. Using case studies including Cr-Mo, Cu-Au, and Pt-W, we demonstrate that MLIPs, particularly ORB, achieve computational speedups exceeding three orders of magnitude compared to DFT while maintaining phase stability predictions within acceptable accuracy. Extending this approach to liquid phases and ternary systems like Cr-Mo-V highlights its versatility for high-entropy alloys and complex chemical spaces. This work demonstrates that MLIPs, integrated with tools like ATAT within a CALPHAD framework, provide an efficient and accurate framework for high-throughput thermodynamic modeling, enabling rapid exploration of novel alloy systems. While many challenges remain to be addressed, the accuracy of some of these MLIPs (ORB in particular) are on the verge of paving the way toward high-throughput generation of CALPHAD thermodynamic descriptions of multi-component, multi-phase alloy systems.

LGFeb 28, 2025
Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

Keqiang Yan, Xiner Li, Hongyi Ling et al.

We consider the problem of crystal materials generation using language models (LMs). A key step is to convert 3D crystal structures into 1D sequences to be processed by LMs. Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. Mat2Seq converts 3D crystal structures into 1D sequences and ensures that different mathematical descriptions of the same crystal are represented in a single unique sequence, thereby provably achieving SE(3) and periodic invariance. Experimental results show that, with language models, Mat2Seq achieves promising performance in crystal structure generation as compared with prior methods.

AIJun 5, 2025
Toward Greater Autonomy in Materials Discovery Agents: Unifying Planning, Physics, and Scientists

Lianhao Zhou, Hongyi Ling, Keqiang Yan et al.

We aim at designing language agents with greater autonomy for crystal materials discovery. While most of existing studies restrict the agents to perform specific tasks within predefined workflows, we aim to automate workflow planning given high-level goals and scientist intuition. To this end, we propose Materials Agent unifying Planning, Physics, and Scientists, known as MAPPS. MAPPS consists of a Workflow Planner, a Tool Code Generator, and a Scientific Mediator. The Workflow Planner uses large language models (LLMs) to generate structured and multi-step workflows. The Tool Code Generator synthesizes executable Python code for various tasks, including invoking a force field foundation model that encodes physics. The Scientific Mediator coordinates communications, facilitates scientist feedback, and ensures robustness through error reflection and recovery. By unifying planning, physics, and scientists, MAPPS enables flexible and reliable materials discovery with greater autonomy, achieving a five-fold improvement in stability, uniqueness, and novelty rates compared with prior generative models when evaluated on the MP-20 data. We provide extensive experiments across diverse tasks to show that MAPPS is a promising framework for autonomous materials discovery.