Xinzhe Wu

NA
5papers
2citations
Novelty47%
AI Score48

5 Papers

69.2COMP-PHMay 29
Data-Driven Spectral Prediction for Accelerating Large-Scale Electronic Structure Calculations

Abhiram Badrinarayanan, Davor Davidovic, Edoardo Di Napoli et al.

Simulating large molecular systems comprising thousands of atoms requires highly scalable methodologies. While modern Density Functional Theory (DFT) codes exhibit linear scaling, solving the associated large, sparse generalized eigenproblems remains a critical computational bottleneck on exascale architectures. In the context of the LimitX project, we propose a data-driven framework to accelerate these calculations. By shifting the machine learning target from discrete eigenvalues to the coefficients of an interpolating Chebyshev polynomial, and by comparing both all-atom and fragment-based structural representations, we successfully overcome the dimensionality constraints of large-scale spectral prediction. We investigate three machine learning models (Kernel Ridge Regression, Graph Neural Networks, and Random Forests) trained on a novel 2 TB dataset of protein dimers. The predicted spectra provide initial guesses that effectively bypass early Self-Consistent Field (SCF) iterations in BigDFT. Ultimately, these spectral predictors will be deployed to dynamically optimize upcoming rational filter-based eigensolvers, such as FrASE, which is currently in initial development.

71.5AIMay 29Code
AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle

Weitong Qian, Beicheng Xu, Zhongao Xie et al.

Scientific research has traditionally been human-intensive, requiring researchers to coordinate literature, ideas, experiments, manuscripts, and review responses across long project cycles. The rise of LLM-based scientific agents creates an opportunity to automate this process. Such a system must support the full research lifecycle, maintain structured persistent memory across projects, and improve its own research procedures over time. However, existing systems either partially satisfy or fail to satisfy these requirements, leaving a gap for a unified automated scientific research system. As a result, we present AutoSci, a memory-centric agentic system for the full scientific research lifecycle. AutoSci is organized around four modules. SciMem provides schema-governed research memory, separating Long-Term Knowledge Memory for reusable scientific knowledge from Active Research Memory for project-level artifacts such as ideas, experiments, manuscripts, and reviews. SciFlow executes a five-stage lifecycle from literature understanding to rebuttal through a harness that controls state, context, verification, feedback, and orchestration. SciDAG augments difficult skills with DAG-shaped multi-agent operators and reusable stage-specific templates. SciEvolve converts feedback signals from users, experiments, reviews, and external environments into versioned updates to SciMem organization, SciFlow skills, and SciDAG templates. Together, these modules make AutoSci a persistent research environment that can execute, remember, and evolve across research projects. The code repository is available at https://github.com/skyllwt/AutoSci.

CEMar 7, 2023
Computing formation enthalpies through an explainable machine learning method: the case of Lanthanide Orthophosphates solid solutions

Edoardo Di Napoli, Xinzhe Wu, Thomas Bornhake et al.

In the last decade, the use of Machine and Deep Learning (MDL) methods in Condensed Matter physics has seen a steep increase in the number of problems tackled and methods employed. A number of distinct MDL approaches have been employed in many different topics; from prediction of materials properties to computation of Density Functional Theory potentials and inter-atomic force fields. In many cases the result is a surrogate model which returns promising predictions but is opaque on the inner mechanisms of its success. On the other hand, the typical practitioner looks for answers that are explainable and provide a clear insight on the mechanisms governing a physical phenomena. In this work, we describe a proposal to use a sophisticated combination of traditional Machine Learning methods to obtain an explainable model that outputs an explicit functional formulation for the material property of interest. We demonstrate the effectiveness of our methodology in deriving a new highly accurate expression for the enthalpy of formation of solid solutions of lanthanides orthophosphates.

85.2NAApr 16
Chebyshev Accelerated Subspace Eigensolver for Pseudo-hermitian Hamiltonians

Edoardo Di Napoli, Clément Richefort, Xinzhe Wu

Studying the optoelectronic structure of materials can require the computation of several thousands of the smallest positive eigenpairs of a pseudo-hermitian Hamiltonian. Iterative eigensolvers may be preferred over direct methods for this task since their complexity is a function of the desired fraction of the spectrum. In addition, they generally rely on highly optimized and scalable kernels such as matrix-vector multiplications that leverage the massive parallelism and the computational power of modern exascale systems. The Chebyshev Accelerated Subspace iteration Eigensolver (ChASE) is able to compute several thousands of the most extreme eigenpairs of dense hermitian matrices with proven scalability over massive parallel accelerated clusters. This work presents an extension of ChASE to solve for a portion of the smallest positive eigenpairs of pseudo-hermitian Hamiltonians as they appear in the treatment of excitonic materials. By exploiting the numerical structure and spectral properties of the Hamiltonian matrix, we preserve the characteristic positive-negative symmetry in the treatment of the eigenvectors and propose an oblique variant of Rayleigh-Ritz projection that features quadratic convergence of the Ritz values with no explicit construction of the dual basis. Additionally, we introduce a parallel implementation of the recursive matrix-product operation appearing in the Chebyshev filter with limited amount of global communications. Our development is supported by a full numerical analysis and experimental tests.

96.9NAMar 11
Estimating the condition number of Chebyshev filtered vectors with application to the ChASE library

Edoardo Di Napoli, Xinzhe Wu

Chebyshev filtered subspace iteration is a well-known algorithm for the solution of (symmetric/Hermitian) algebraic eigenproblems which has been implemented in several application codes~\cite{Kronik:2006ff, abinit:2020} or in stand alone libraries~\cite{ChASE}. An essential part of the algorithm is the QR-factorization of the array of vectors spanning the active subspace that have been filtered by the Chebyshev filter. Typically such an array has an a-priori unknown high condition number that directly influences the choice of QR-factorization algorithm. In this work we show how such condition number can be bound from above with precise and inexpensive estimates. We then proceed to use these estimates to implement a mechanism for the choice of QR-factorization in the ChASE library. We show how such mechanism enhance the performance of the library without compromising on its accuracy.