h-index34
66papers
735citations
Novelty55%
AI Score57

66 Papers

NAOct 21, 2011
Adaptive local basis set for Kohn-Sham density functional theory in a discontinuous Galerkin framework I: Total energy calculation

Lin Lin, Jianfeng Lu, Lexing Ying et al.

Kohn-Sham density functional theory is one of the most widely used electronic structure theories. In the pseudopotential framework, uniform discretization of the Kohn-Sham Hamiltonian generally results in a large number of basis functions per atom in order to resolve the rapid oscillations of the Kohn-Sham orbitals around the nuclei. Previous attempts to reduce the number of basis functions per atom include the usage of atomic orbitals and similar objects, but the atomic orbitals generally require fine tuning in order to reach high accuracy. We present a novel discretization scheme that adaptively and systematically builds the rapid oscillations of the Kohn-Sham orbitals around the nuclei as well as environmental effects into the basis functions. The resulting basis functions are localized in the real space, and are discontinuous in the global domain. The continuous Kohn-Sham orbitals and the electron density are evaluated from the discontinuous basis functions using the discontinuous Galerkin (DG) framework. Our method is implemented in parallel and the current implementation is able to handle systems with at least thousands of atoms. Numerical examples indicate that our method can reach very high accuracy (less than 1meV) with a very small number ($4\sim 40$) of basis functions per atom.

MTRL-SCIDec 23, 2008
Multipole Representation of the Fermi Operator with Application to the Electronic Structure Analysis of Metallic Systems

Lin Lin, Jianfeng Lu, Roberto Car et al.

We propose a multipole representation of the Fermi-Dirac function and the Fermi operator, and use this representation to develop algorithms for electronic structure analysis of metallic systems. The new algorithm is quite simple and efficient. Its computational cost scales logarithmically with $βΔ\eps$ where $β$ is the inverse temperature, and $Δ\eps$ is the width of the spectrum of the discretized Hamiltonian matrix.

NAMay 7, 2016
Low Rank Approximation in $G_0W_0$ Approximation

Meiyue Shao, Lin Lin, Chao Yang et al.

The single particle energies obtained in a Kohn--Sham density functional theory (DFT) calculation are generally known to be poor approximations to electron excitation energies that are measured in transport, tunneling and spectroscopic experiments such as photo-emission spectroscopy. The correction to these energies can be obtained from the poles of a single particle Green's function derived from a many-body perturbation theory. From a computational perspective, the accuracy and efficiency of such an approach depends on how a self energy term that properly accounts for dynamic screening of electrons is approximated. The $G_0W_0$ approximation is a widely used technique in which the self energy is expressed as the convolution of a non-interacting Green's function ($G_0$) and a screened Coulomb interaction ($W_0$) in the frequency domain. The computational cost associated with such a convolution is high due to the high complexity of evaluating $W_0$ at multiple frequencies. In this paper, we discuss how the cost of $G_0W_0$ calculation can be reduced by constructing a low rank approximation to the frequency dependent part of $W_0$. In particular, we examine the effect of such a low rank approximation on the accuracy of the $G_0W_0$ approximation. We also discuss how the numerical convolution of $G_0$ and $W_0$ can be evaluated efficiently and accurately by using a contour deformation technique with an appropriate choice of the contour.

COMP-PHMay 9, 2013
Elliptic preconditioner for accelerating the self consistent field iteration in Kohn-Sham density functional theory

Lin Lin, Chao Yang

We discuss techniques for accelerating the self consistent field (SCF) iteration for solving the Kohn-Sham equations. These techniques are all based on constructing approximations to the inverse of the Jacobian associated with a fixed point map satisfied by the total potential. They can be viewed as preconditioners for a fixed point iteration. We point out different requirements for constructing preconditioners for insulating and metallic systems respectively, and discuss how to construct preconditioners to keep the convergence rate of the fixed point iteration independent of the size of the atomistic system. We propose a new preconditioner that can treat insulating and metallic system in a unified way. The new preconditioner, which we call an elliptic preconditioner, is constructed by solving an elliptic partial differential equation. The elliptic preconditioner is shown to be more effective in accelerating the convergence of a fixed point iteration than the existing approaches for large inhomogeneous systems at low temperature.

COMP-PHJan 27, 2016
Adaptively Compressed Exchange Operator

Lin Lin

The Fock exchange operator plays a central role in modern quantum chemistry. The large computational cost associated with the Fock exchange operator hinders Hartree-Fock calculations and Kohn-Sham density functional theory calculations with hybrid exchange-correlation functionals, even for systems consisting of hundreds of atoms. We develop the adaptively compressed exchange operator (ACE) formulation, which greatly reduces the computational cost associated with the Fock exchange operator without loss of accuracy. The ACE formulation does not depend on the size of the band gap, and thus can be applied to insulating, semiconducting as well as metallic systems. In an iterative framework for solving Hartree-Fock-like systems, the ACE formulation only requires moderate modification of the code, and can be potentially beneficial for all electronic structure software packages involving exchange calculations. Numerical results indicate that the ACE formulation can become advantageous even for small systems with tens of atoms. In particular, the cost of each self-consistent field iteration for the electron density in the ACE formulation is only marginally larger than that of the generalized gradient approximation (GGA) calculation, and thus offers orders of magnitude speedup for Hartree-Fock-like calculations.

NAFeb 25, 2019
A multiscale neural network based on hierarchical nested bases

Yuwei Fan, Jordi Feliu-Faba, Lin Lin et al.

In recent years, deep learning has led to impressive results in many fields. In this paper, we introduce a multi-scale artificial neural network for high-dimensional non-linear maps based on the idea of hierarchical nested bases in the fast multipole method and the $\mathcal{H}^2$-matrices. This approach allows us to efficiently approximate discretized nonlinear maps arising from partial differential equations or integral equations. It also naturally extends our recent work based on the generalization of hierarchical matrices [Fan et al. arXiv:1807.01883] but with a reduced number of parameters. In particular, the number of parameters of the neural network grows linearly with the dimension of the parameter space of the discretized PDE. We demonstrate the properties of the architecture by approximating the solution maps of non-linear Schr{ö}dinger equation, the radiative transfer equation, and the Kohn-Sham map.

NAMay 25, 2016
Adaptively compressed polarizability operator for accelerating large scale \textit{ab initio} phonon calculations

Lin Lin, Ze Xu, Lexing Ying

Phonon calculations based on first principle electronic structure theory, such as the Kohn-Sham density functional theory, have wide applications in physics, chemistry and material science. The computational cost of first principle phonon calculations typically scales steeply as $\mathcal{O}(N_e^4)$, where $N_e$ is the number of electrons in the system. In this work, we develop a new method to reduce the computational complexity of computing the full dynamical matrix, and hence the phonon spectrum, to $\mathcal{O}(N_e^3)$. The key concept for achieving this is to compress the polarizability operator adaptively with respect to the perturbation of the potential due to the change of the atomic configuration. Such adaptively compressed polarizability operator (ACP) allows accurate computation of the phonon spectrum. The reduction of complexity only weakly depends on the size of the band gap, and our method is applicable to insulators as well as semiconductors with small band gaps. We demonstrate the effectiveness of our method using one-dimensional and two-dimensional model problems.

COMP-PHNov 28, 2011
Optimized local basis set for Kohn-Sham density functional theory

Lin Lin, Jianfeng Lu, Lexing Ying et al.

We develop a technique for generating a set of optimized local basis functions to solve models in the Kohn-Sham density functional theory for both insulating and metallic systems. The optimized local basis functions are obtained by solving a minimization problem in an admissible set determined by a large number of primitive basis functions. Using the optimized local basis set, the electron energy and the atomic force can be calculated accurately with a small number of basis functions. The Pulay force is systematically controlled and is not required to be calculated, which makes the optimized local basis set an ideal tool for ab initio molecular dynamics and structure optimization. We also propose a preconditioned Newton-GMRES method to obtain the optimized local basis functions in practice. The optimized local basis set is able to achieve high accuracy with a small number of basis functions per atom when applied to a one dimensional model problem.

NANov 23, 2015
Randomized estimation of spectral densities of large matrices made accurate

Lin Lin

For a large Hermitian matrix $A\in \mathbb{C}^{N\times N}$, it is often the case that the only affordable operation is matrix-vector multiplication. In such case, randomized method is a powerful way to estimate the spectral density (or density of states) of $A$. However, randomized methods developed so far for estimating spectral densities only extract information from different random vectors independently, and the accuracy is therefore inherently limited to $\mathcal{O}(1/\sqrt{N_{v}})$ where $N_{v}$ is the number of random vectors. In this paper we demonstrate that the "$\mathcal{O}(1/\sqrt{N_{v}})$ barrier" can be overcome by taking advantage of the correlated information of random vectors when properly filtered by polynomials of $A$. Our method uses the fact that the estimation of the spectral density essentially requires the computation of the trace of a series of matrix functions that are numerically low rank. By repeatedly applying $A$ to the same set of random vectors and taking different linear combination of the results, we can sweep through the entire spectrum of $A$ by building such low rank decomposition at different parts of the spectrum. Under some assumptions, we demonstrate that a robust and efficient implementation of such spectrum sweeping method can compute the spectral density accurately with $\mathcal{O}(N^2)$ computational cost and $\mathcal{O}(N)$ memory cost. Numerical results indicate that the new method can significantly outperform existing randomized methods in terms of accuracy. As an application, we demonstrate a way to accurately compute a trace of a smooth matrix function, by carefully balancing the smoothness of the integrand and the regularized density of states using a deconvolution procedure.

COMP-PHMar 21, 2013
Efficient iterative method for solving the Dirac-Kohn-Sham density functional theory

Lin Lin, Sihong Shao, Weinan E

We present for the first time an efficient iterative method to directly solve the four-component Dirac-Kohn-Sham (DKS) density functional theory. Due to the existence of the negative energy continuum in the DKS operator, the existing iterative techniques for solving the Kohn-Sham systems cannot be efficiently applied to solve the DKS systems. The key component of our method is a novel filtering step (F) which acts as a preconditioner in the framework of the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The resulting method, dubbed the LOBPCG-F method, is able to compute the desired eigenvalues and eigenvectors in the positive energy band without computing any state in the negative energy band. The LOBPCG-F method introduces mild extra cost compared to the standard LOBPCG method and can be easily implemented. We demonstrate our method in the pseudopotential framework with a planewave basis set which naturally satisfies the kinetic balance prescription. Numerical results for Pt$_{2}$, Au$_{2}$, TlF, and Bi$_{2}$Se$_{3}$ indicate that the LOBPCG-F method is a robust and efficient method for investigating the relativistic effect in systems containing heavy elements.

NANov 26, 2018
A General Framework for Enhancing Sparsity of Generalized Polynomial Chaos Expansions

Xiu Yang, Xiaoliang Wan, Lin Lin et al.

Compressive sensing has become a powerful addition to uncertainty quantification when only limited data is available. In this paper we provide a general framework to enhance the sparsity of the representation of uncertainty in the form of generalized polynomial chaos expansion. We use alternating direction method to identify new sets of random variables through iterative rotations such that the new representation of the uncertainty is sparser. Consequently, we increases both the efficiency and accuracy of the compressive sensing-based uncertainty quantification method. We demonstrate that the previously developed iterative method to enhance the sparsity of Hermite polynomial expansion is a special case of this general framework. Moreover, we use Legendre and Chebyshev polynomials expansions to demonstrate the effectiveness of this method with applications in solving stochastic partial differential equations and high-dimensional (O(100)) problems.

NANov 20, 2017
Convergence of adaptive compression methods for Hartree-Fock-like equations

Lin Lin, Michael Lindsey

The adaptively compressed exchange (ACE) method provides an efficient way for solving Hartree-Fock-like equations in quantum physics, chemistry, and materials science. The key step of the ACE method is to adaptively compress an operator that is possibly dense and full-rank. In this paper, we present a detailed study of the adaptive compression operation and establish rigorous convergence properties of the adaptive compression method in the context of solving linear eigenvalue problems. Our analysis also elucidates the potential use of the adaptive compression method in a wide range of problems.

NAMar 14, 2016
A posteriori error estimates for discontinuous Galerkin methods using non-polynomial basis functions. Part II: Eigenvalue problems

Lin Lin, Benjamin Stamm

We present the first systematic work for deriving a posteriori error estimates for general non-polynomial basis functions in an interior penalty discontinuous Galerkin (DG) formulation for solving eigenvalue problems associated with second order linear operators. Eigenvalue problems of such types play important roles in scientific and engineering applications, particularly in theoretical chemistry, solid state physics and material science. Based on the framework developed in [{\it L. Lin, B. Stamm, http://dx.doi.org/10.1051/m2an/2015069}] for second order PDEs, we develop residual type upper and lower bound error estimates for measuring the a posteriori error for eigenvalue problems. The main merit of our method is that the method is parameter-free, in the sense that all but one solution-dependent constants appearing in the upper and lower bound estimates are explicitly computable by solving local and independent eigenvalue problems, and the only non-computable constant can be reasonably approximated by a computable one without affecting the overall effectiveness of the estimates in practice. Compared to the PDE case, we find that a posteriori error estimators for eigenvalue problems must neglect certain terms, which involves explicitly the exact eigenvalues or eigenfunctions that are not accessible in numerical simulations. We define such terms carefully, and justify numerically that the neglected terms are indeed numerically high order terms compared to the computable estimators. Numerical results for a variety of problems in 1D and 2D demonstrate that both the upper bound and lower bound are effective for measuring the error of eigenvalues and eigenfunctions.

NANov 23, 2015
Localized spectrum slicing

Lin Lin

Given a sparse Hermitian matrix $A$ and a real number $μ$, we construct a set of sparse vectors, each approximately spanned only by eigenvectors of $A$ corresponding to eigenvalues near $μ$. This set of vectors spans the column space of a localized spectrum slicing (LSS) operator, and is called an LSS basis set. The sparsity of the LSS basis set is related to the decay properties of matrix Gaussian functions. We present a divide-and-conquer strategy with controllable error to construct the LSS basis set. This is a purely algebraic process using only submatrices of $A$, and can therefore be applied to general sparse Hermitian matrices. The LSS basis set leads to sparse projected matrices with reduced sizes, which allows the projected problems to be solved efficiently with techniques using sparse linear algebra. As an example, we demonstrate that the LSS basis set can be used to solve interior eigenvalue problems for a discretized second order partial differential operator in one-dimensional and two-dimensional domains, as well as for a matrix of general sparsity pattern.

COMP-PHOct 19, 2016
PEXSI-$Σ$: A Green's function embedding method for Kohn-Sham density functional theory

Xiantao Li, Lin Lin, Jianfeng Lu

In this paper, we propose a new Green's function embedding method called PEXSI-$Σ$ for describing complex systems within the Kohn-Sham density functional theory (KSDFT) framework, after revisiting the physics literature of Green's function embedding methods from a numerical linear algebra perspective. The PEXSI-$Σ$ method approximates the density matrix using a set of nearly optimally chosen Green's functions evaluated at complex frequencies. For each Green's function, the complex boundary conditions are described by a self energy matrix $Σ$ constructed from a physical reference Green's function, which can be computed relatively easily. In the linear regime, such treatment of the boundary condition can be numerically exact. The support of the $Σ$ matrix is restricted to degrees of freedom near the boundary of computational domain, and can be interpreted as a frequency dependent surface potential. This makes it possible to perform KSDFT calculations with $\mathcal{O}(N^2)$ computational complexity, where $N$ is the number of atoms within the computational domain. Green's function embedding methods are also naturally compatible with atomistic Green's function methods for relaxing the atomic configuration outside the computational domain. As a proof of concept, we demonstrate the accuracy of the PEXSI-$Σ$ method for graphene with divacancy and dislocation dipole type of defects using the DFTB+ software package.

MLOct 6, 2022
Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva et al.

Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and auxiliary metrics or covariates. We incorporate auxiliary covariates among test-level covariates in a deep Black-Box framework controlling FDR (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple-hypothesis testing. Our method parametrizes the test-level covariates as a neural network and adjusts the auxiliary covariates through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines.

CLFeb 11
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li, Aobo Kong et al.

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.

NAJun 15, 2016
Decay estimates of discretized Green's functions for Schrödinger type operators

Lin Lin, Jianfeng Lu

For a sparse non-singular matrix $A$, generally $A^{-1}$ is a dense matrix. However, for a class of matrices, $A^{-1}$ can be a matrix with off-diagonal decay properties, i.e. $\lvert A^{-1}_{ij}\rvert$ decays fast to $0$ with respect to the increase of a properly defined distance between $i$ and $j$. Here we consider the off-diagonal decay properties of discretized Green's functions for Schrödinger type operators. We provide decay estimates for discretized Green's functions obtained from the finite difference discretization, and from a variant of the pseudo-spectral discretization. The asymptotic decay rate in our estimate is independent of the domain size and of the discretization parameter. We verify the decay estimate with numerical results for one-dimensional Schrödinger type operators.

NAJun 1, 2018
Globally Constructed Adaptive Local Basis Set for Spectral Projectors of Second Order Differential Operators

Yingzhou Li, Lin Lin

Spectral projectors of second order differential operators play an important role in quantum physics and other scientific and engineering applications. In order to resolve local features and to obtain converged results, typically the number of degrees of freedom needed is much larger than the rank of the spectral projector. This leads to significant cost in terms of both computation and storage. In this paper, we develop a method to construct a basis set that is adaptive to the given differential operator. The basis set is systematically improvable, and the local features of the projector is built into the basis set. As a result the required number of degrees of freedom is only a small constant times the rank of the projector. The construction of the basis set uses a randomized procedure, and only requires applying the differential operator to a small number of vectors on the global domain, while each basis function itself is supported on strictly local domains and is discontinuous across the global domain. The spectral projector on the global domain is systematically approximated from such a basis set using the discontinuous Galerkin (DG) method. The global construction procedure is very flexible, and allows a local basis set to be consistently constructed even if the operator contains a nonlocal potential term. We verify the effectiveness of the globally constructed adaptive local basis set using one-, two- and three-dimensional linear problems with local potentials, as well as a one dimensional nonlinear problem with nonlocal potentials resembling the Hartree-Fock problem in quantum physics.

COMP-PHJan 25, 2018
Variational formulation for Wannier functions with entangled band structure

Anil Damle, Antoine Levitt, Lin Lin

Wannier functions provide a localized representation of spectral subspaces of periodic Hamiltonians, and play an important role for interpreting and accelerating Hartree-Fock and Kohn-Sham density functional theory calculations in quantum physics and chemistry. For systems with isolated band structure, the existence of exponentially localized Wannier functions and numerical algorithms for finding them are well studied. In contrast, for systems with entangled band structure, Wannier functions must be generalized to span a subspace larger than the spectral subspace of interest to achieve favorable spatial locality. In this setting, little is known about the theoretical properties of these Wannier functions, and few algorithms can find them robustly. We develop a variational formulation to compute these generalized maximally localized Wannier functions. When paired with an initial guess based on the selected columns of the density matrix (SCDM) method, our method can robustly find Wannier functions for systems with entangled band structure. We formulate the problem as a constrained nonlinear optimization problem, and show how the widely used disentanglement procedure can be interpreted as a splitting method to approximately solve this problem. We demonstrate the performance of our method using real materials including silicon, copper, and aluminum. To examine more precisely the localization properties of Wannier functions, we study the free electron gas in one and two dimensions, where we show that the maximally-localized Wannier functions only decay algebraically. We also explain using a one dimensional example how to modify them to obtain super-algebraic decay.

NAMar 22, 2023
Anti-symmetric Barron functions and their approximation with sums of determinants

Nilin Abrahamsen, Lin Lin

A fundamental problem in quantum physics is to encode functions that are completely anti-symmetric under permutations of identical particles. The Barron space consists of high-dimensional functions that can be parameterized by infinite neural networks with one hidden layer. By explicitly encoding the anti-symmetric structure, we prove that the anti-symmetric functions which belong to the Barron space can be efficiently approximated with sums of determinants. This yields a factorial improvement in complexity compared to the standard representation in the Barron space and provides a theoretical explanation for the effectiveness of determinant-based architectures in ab-initio quantum chemistry.

QUANT-PHApr 29
Quantum Filtering and Analysis of Multiplicities in Eigenvalue Spectra

Zhiyan Ding, Lin Lin, Yilun Yang et al.

Fine-grained spectral properties of quantum Hamiltonians, including both eigenvalues and their multiplicities, provide useful information for characterizing many-body quantum systems as well as for understanding phenomena such as topological order. Extracting such information with small additive error is $\#\textsf{BQP}$-complete in the worst case. In this work, we introduce QFAMES (Quantum Filtering and Analysis of Multiplicities in Eigenvalue Spectra), a quantum algorithm that efficiently identifies clusters of closely spaced dominant eigenvalues and determines their multiplicities under physically motivated assumptions, which allows us to bypass worst-case complexity barriers. QFAMES also enables the estimation of observable expectation values within targeted energy clusters, providing a powerful tool for studying quantum phase transitions and other physical properties. We validate the effectiveness of QFAMES through numerical demonstrations, including its applications to characterizing quantum phases in the transverse-field Ising model and estimating the ground-state degeneracy of a topologically ordered phase in the two-dimensional toric code model. We also generalize QFAMES to the setting of mixed initial states. Our approach offers rigorous theoretical guarantees and significant advantages over existing subspace-based quantum spectral analysis methods, particularly in terms of the sample complexity and the ability to resolve degeneracies.

MEFeb 24, 2023
Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning

Rui Miao, Zhengling Qi, Cong Shi et al.

Pricing based on individual customer characteristics is widely used to maximize sellers' revenues. This work studies offline personalized pricing under endogeneity using an instrumental variable approach. Standard instrumental variable methods in causal inference/econometrics either focus on a discrete treatment space or require the exclusion restriction of instruments from having a direct effect on the outcome, which limits their applicability in personalized pricing. In this paper, we propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables (PRINT) for continuous treatment that allow direct effects on the outcome. Specifically, relying on the structural models of revenue and price, we establish the identifiability condition of an optimal pricing strategy under endogeneity with the help of invalid instrumental variables. Based on this new identification, which leads to solving conditional moment restrictions with generalized residual functions, we construct an adversarial min-max estimator and learn an optimal pricing strategy. Furthermore, we establish an asymptotic regret bound to find an optimal pricing strategy. Finally, we demonstrate the effectiveness of the proposed method via extensive simulation studies as well as a real data application from an US online auto loan company.

LGMar 21, 2023
Convergence of variational Monte Carlo simulation and scale-invariant pre-training

Nilin Abrahamsen, Zhiyan Ding, Gil Goldshlager et al.

We provide theoretical convergence bounds for the variational Monte Carlo (VMC) method as applied to optimize neural network wave functions for the electronic structure problem. We study both the energy minimization phase and the supervised pre-training phase that is commonly used prior to energy minimization. For the energy minimization phase, the standard algorithm is scale-invariant by design, and we provide a proof of convergence for this algorithm without modifications. The pre-training stage typically does not feature such scale-invariance. We propose using a scale-invariant loss for the pretraining phase and demonstrate empirically that it leads to faster pre-training.

CLDec 23, 2025
Step-DeepResearch Technical Report

Chen Hu, Haikuo Du, Heng Wang et al.

As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-source verification. To address this, we introduce Step-DeepResearch, a cost-effective, end-to-end agent. We propose a Data Synthesis Strategy Based on Atomic Capabilities to reinforce planning and report writing, combined with a progressive training path from agentic mid-training to SFT and RL. Enhanced by a Checklist-style Judger, this approach significantly improves robustness. Furthermore, to bridge the evaluation gap in the Chinese domain, we establish ADR-Bench for realistic deep research scenarios. Experimental results show that Step-DeepResearch (32B) scores 61.4% on Scale AI Research Rubrics. On ADR-Bench, it significantly outperforms comparable models and rivals SOTA closed-source models like OpenAI and Gemini DeepResearch. These findings prove that refined training enables medium-sized models to achieve expert-level capabilities at industry-leading cost-efficiency.

IVAug 14, 2024
Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

Senmao Wang, Haifan Gong, Runmeng Cui et al.

Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range costal cartilage relationships. Our method leverages a deformable model that integrates topological priors to enhance the adaptability and accuracy of the segmentation process. Furthermore, we developed a comprehensive benchmark that contains 165 cases for costal cartilage segmentation. This benchmark sets a new standard for evaluating costal cartilage segmentation techniques and provides a valuable resource for future research. Extensive experiments conducted on both in-domain benchmarks and out-of domain test sets demonstrate the superiority of our approach over existing methods, showing significant improvements in segmentation precision and robustness.

LGMay 24, 2022
Efficient anti-symmetrization of a neural network layer by taming the sign problem

Nilin Abrahamsen, Lin Lin

Explicit antisymmetrization of a neural network is a potential candidate for a universal function approximator for generic antisymmetric functions, which are ubiquitous in quantum physics. However, this procedure is a priori factorially costly to implement, making it impractical for large numbers of particles. The strategy also suffers from a sign problem. Namely, due to near-exact cancellation of positive and negative contributions, the magnitude of the antisymmetrized function may be significantly smaller than before anti-symmetrization. We show that the anti-symmetric projection of a two-layer neural network can be evaluated efficiently, opening the door to using a generic antisymmetric layer as a building block in anti-symmetric neural network Ansatzes. This approximation is effective when the sign problem is controlled, and we show that this property depends crucially the choice of activation function under standard Xavier/He initialization methods. As a consequence, using a smooth activation function requires re-scaling of the neural network weights compared to standard initializations.

QUANT-PHMar 17
Towards End-to-End Quantum Estimation of Non-Hermitian Pseudospectra

Gengzhi Yang, Jiaqi Leng, Xiaodi Wu et al.

Non-Hermitian many-body systems can be spectrally unstable, so small perturbations may induce large eigenvalue shifts. The pseudospectrum quantifies this instability and provides a perturbation-robust diagnostic. For inverse-polynomially small $ε$, we show that deciding whether a point $z\in\mathbb{C}$ is $ε$-close to the spectrum is PSPACE-hard for $5$-local operators, whereas deciding whether $z$ lies in the $ε$-pseudospectrum is QMA-complete for $4$-local operators. This identifies pseudospectrum membership as a natural computational target. We then present a concrete end-to-end quantum framework for deciding pseudospectrum membership, which combines a singular-value estimation step with a dissipative state preparation algorithm. Our Quantum Singular-value Gaussian-filtered Search (QSIGS) combines quantum singular value transformation (QSVT) with classical post-processing to achieve Heisenberg-limited query scaling for singular-value estimation. To prepare suitable input states, we introduce an algorithmic Lindbladian protocol for approximate ground right singular vectors and prove its effectiveness for the Hatano--Nelson model. Finally, we demonstrate the full pipeline on a trapped-ion quantum computer and distinguish points inside and outside the target pseudospectrum near the exceptional point of a minimal non-Hermitian qubit model.

LGMay 10
One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning

Bowen He, Juncheng Dong, Lin Lin et al.

A central challenge in reinforcement learning (RL) is to learn models that generalize beyond the tasks on which they are trained, a goal traditionally pursued through multi-task and meta RL. Recently, transformer architectures have emerged as a promising approach, enabling adaptation to new tasks via in-context learning without explicit parameter updates. From a functional perspective, a transformer can be viewed as a functional operator that maps a context to a task-specific function. It is thus fundamental to understand and design this operator to support stronger generalization in RL. In this work, we address this resulting question of generalization from a kernel-based perspective by establishing a connection between non-linear transformers and kernel-based temporal difference learning. By interpreting the transformer as performing regression in a Reproducing Kernel Hilbert Space (RKHS), we show that value functions from different domains can be represented using a shared set of weights, provided they lie within the same RKHS. Experiments on multiple MetaWorld domains support this interpretation, demonstrating convergence of the temporal-difference objective.

ASFeb 15, 2022Code
Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model

Ying Shen, Huiyu Yang, Lin Lin

Depression is a global mental health problem, the worst case of which can lead to suicide. An automatic depression detection system provides great help in facilitating depression self-assessment and improving diagnostic accuracy. In this work, we propose a novel depression detection approach utilizing speech characteristics and linguistic contents from participants' interviews. In addition, we establish an Emotional Audio-Textual Depression Corpus (EATD-Corpus) which contains audios and extracted transcripts of responses from depressed and non-depressed volunteers. To the best of our knowledge, EATD-Corpus is the first and only public depression dataset that contains audio and text data in Chinese. Evaluated on two depression datasets, the proposed method achieves the state-of-the-art performances. The outperforming results demonstrate the effectiveness and generalization ability of the proposed method. The source code and EATD-Corpus are available at https://github.com/speechandlanguageprocessing/ICASSP2022-Depression.

LGJul 25, 2025
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

StepFun, Bin Wang, Bojun Wang et al.

Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache size and computation while maintaining high attention expressiveness, and (2) Attention-FFN Disaggregation (AFD), a distributed inference system that decouples attention and Feed-Forward Network (FFN) layers into specialized subsystems. This co-design achieves unprecedented cost efficiency: Step-3 significantly reduces theoretical decoding costs compared with models like DeepSeek-V3 and Qwen3 MoE 235B, with the gains widening at longer context. Step-3 achieves low cost while activating 38B parameters per token (more than DeepSeek-V3 and Qwen3 MoE 235B), demonstrating that hardware-aligned attention arithmetic intensity, MoE sparsity, and AFD are critical to cost-effectiveness. We perform a head-to-head comparison with DeepSeek-V3 in its favorable scenarios. Our implementation on Hopper GPUs achieves a decoding throughput of up to 4,039 tokens per second per GPU under 50ms TPOT SLA (4K context, FP8, no MTP). It is higher than DeepSeek-V3's 2,324 in the same setup and sets a new Pareto frontier for LLM decoding.

QUANT-PHApr 24
Accelerating quantum Gibbs sampling without quantum walks

Jiaqi Leng, Jiaqing Jiang, Lin Lin

Szegedy's quantum walk gives a generic quadratic speedup for reversible classical Markov chains, but extending this mechanism to quantum Gibbs sampling has remained challenging beyond special cases. We present a walk-free quantum algorithm for preparing purified Gibbs states with a quadratic improvement in spectral-gap dependence for a broad class of quantum Gibbs samplers that satisfy exact Kubo-Martin-Schwinger detailed balance. Our main structural result is an explicit factorization of the corresponding parent Hamiltonian into noncommutative first-order operators. This turns purified Gibbs-state preparation into a singular-value filtering problem and enables a quantum singular value transformation algorithm with quadratically improved gap dependence under standard coherent-access assumptions. The framework applies to several efficiently implementable Gibbs samplers beyond the Davies setting. We also introduce an auxiliary dissipative dynamics based on the same factorization, which can be used to generate warm starts in the doubled Hilbert space in metastable regimes.

LGFeb 2, 2025
Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing

Gil Goldshlager, Jiang Hu, Lin Lin

Due to the ever growing amounts of data leveraged for machine learning and scientific computing, it is increasingly important to develop algorithms that sample only a small portion of the data at a time. In the case of linear least-squares, the randomized block Kaczmarz method (RBK) is an appealing example of such an algorithm, but its convergence is only understood under sampling distributions that require potentially prohibitively expensive preprocessing steps. To address this limitation, we analyze RBK when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $\textit{weighted}$ least-squares solution. Unfortunately, for general problems the condition number of the weight matrix and the variance of the iterates can become arbitrarily large. We control these issues by incorporating regularization into the RBK iterations, yielding the regularized algorithm ReBlocK. Numerical experiments including examples arising from natural gradient optimization demonstrate that ReBlocK can outperform both RBK and minibatch stochastic gradient descent for inconsistent problems with rapidly decaying singular values.

IMApr 2, 2024
CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

Xu Li, Ruiqi Sun, Jiameng Lv et al.

Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to extract them from billions of astronomical objects. The existing mainstream lens-finding tools are based on machine learning algorithms and applied to cut-out-centered galaxies. However, according to the design and survey strategy of optical surveys by CSST, preparing cutouts with multiple bands requires considerable efforts. To overcome these challenges, we have developed a framework based on a hierarchical visual Transformer with a sliding window technique to search for strong lensing systems within entire images. Moreover, given that multi-color images of strong lensing systems can provide insights into their physical characteristics, our framework is specifically crafted to identify strong lensing systems in images with any number of channels. As evaluated using CSST mock data based on an Semi-Analytic Model named CosmoDC2, our framework achieves precision and recall rates of 0.98 and 0.90, respectively. To evaluate the effectiveness of our method in real observations, we have applied it to a subset of images from the DESI Legacy Imaging Surveys and media images from Euclid Early Release Observations. 61 new strong lensing system candidates are discovered by our method. However, we also identified false positives arising primarily from the simplified galaxy morphology assumptions within the simulation. This underscores the practical limitations of our approach while simultaneously highlighting potential avenues for future improvements.

QUANT-PHMay 8, 2025
Operator-Level Quantum Acceleration of Non-Logconcave Sampling

Jiaqi Leng, Zhiyan Ding, Zherui Chen et al.

Sampling from probability distributions of the form $σ\propto e^{-βV}$, where $V$ is a continuous potential, is a fundamental task across physics, chemistry, biology, computer science, and statistics. However, when $V$ is non-convex, the resulting distribution becomes non-logconcave, and classical methods such as Langevin dynamics often exhibit poor performance. We introduce the first quantum algorithm that provably accelerates a broad class of continuous-time sampling dynamics. For Langevin dynamics, our method encodes the target Gibbs measure into the amplitudes of a quantum state, identified as the kernel of a block matrix derived from a factorization of the Witten Laplacian operator. This connection enables Gibbs sampling via singular value thresholding and yields the first provable quantum advantage with respect to the Poincaré constant in the non-logconcave setting. Building on this framework, we further develop the first quantum algorithm that accelerates replica exchange Langevin diffusion, a widely used method for sampling from complex, rugged energy landscapes.

AIOct 21, 2024
RAG4ITOps: A Supervised Fine-Tunable and Comprehensive RAG Framework for IT Operations and Maintenance

Tianyang Zhang, Zhuoxuan Jiang, Shengguang Bai et al.

With the ever-increasing demands on Question Answering (QA) systems for IT operations and maintenance, an efficient and supervised fine-tunable framework is necessary to ensure the data security, private deployment and continuous upgrading. Although Large Language Models (LLMs) have notably improved the open-domain QA's performance, how to efficiently handle enterprise-exclusive corpora and build domain-specific QA systems are still less-studied for industrial applications. In this paper, we propose a general and comprehensive framework based on Retrieval Augmented Generation (RAG) and facilitate the whole business process of establishing QA systems for IT operations and maintenance. In accordance with the prevailing RAG method, our proposed framework, named with RAG4ITOps, composes of two major stages: (1) Models Fine-tuning \& Data Vectorization, and (2) Online QA System Process. At the Stage 1, we leverage a contrastive learning method with two negative sampling strategies to fine-tune the embedding model, and design the instruction templates to fine-tune the LLM with a Retrieval Augmented Fine-Tuning method. At the Stage 2, an efficient process of QA system is built for serving. We collect enterprise-exclusive corpora from the domain of cloud computing, and the extensive experiments show that our method achieves superior results than counterparts on two kinds of QA tasks. Our experiment also provide a case for applying the RAG4ITOps to real-world enterprise-level applications.

MLMay 24, 2024
Canonical Variates in Wasserstein Metric Space

Jia Li, Lin Lin

In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, we define both between-class and within-class variations as the average squared distances between pairs of instances, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. We conduct empirical studies to assess the algorithm's convergence and, through experimental validation, demonstrate that our dimension reduction technique substantially enhances classification performance. Moreover, our method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness against variations in the distributional representations of data clouds.

AIMar 6, 2025
MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs

Tianyang Zhang, Zhuoxuan Jiang, Haotian Zhang et al.

We propose a novel system, MathMistake Checker, designed to automate step-by-step mistake finding in mathematical problems with lengthy answers through a two-stage process. The system aims to simplify grading, increase efficiency, and enhance learning experiences from a pedagogical perspective. It integrates advanced technologies, including computer vision and the chain-of-thought capabilities of the latest large language models (LLMs). Our system supports open-ended grading without reference answers and promotes personalized learning by providing targeted feedback. We demonstrate its effectiveness across various types of math problems, such as calculation and word problems.

GNJan 29, 2025
Constructing Cell-type Taxonomy by Optimal Transport with Relaxed Marginal Constraints

Sebastian Pena, Lin Lin, Jia Li

The rapid emergence of single-cell data has facilitated the study of many different biological conditions at the cellular level. Cluster analysis has been widely applied to identify cell types, capturing the essential patterns of the original data in a much more concise form. One challenge in the cluster analysis of cells is matching clusters extracted from datasets of different origins or conditions. Many existing algorithms cannot recognize new cell types present in only one of the two samples when establishing a correspondence between clusters obtained from two samples. Additionally, when there are more than two samples, it is advantageous to align clusters across all samples simultaneously rather than performing pairwise alignment. Our approach aims to construct a taxonomy for cell clusters across all samples to better annotate these clusters and effectively extract features for downstream analysis. A new system for constructing cell-type taxonomy has been developed by combining the technique of Optimal Transport with Relaxed Marginal Constraints (OT-RMC) and the simultaneous alignment of clusters across multiple samples. OT-RMC allows us to address challenges that arise when the proportions of clusters vary substantially between samples or when some clusters do not appear in all the samples. Experiments on more than twenty datasets demonstrate that the taxonomy constructed by this new system can yield highly accurate annotation of cell types. Additionally, sample-level features extracted based on the taxonomy result in accurate classification of samples.

IROct 14, 2025
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

Lin Lin, Jiefeng Long, Zhihe Wan et al. · pku

Multimodal embedding models aim to yield informative unified representations that empower diverse cross-modal tasks. Despite promising developments in the evolution from CLIP-based dual-tower architectures to large vision-language models, prior works still face unavoidable challenges in real-world applications and business scenarios, such as the limited modality support, unstable training mechanisms, and industrial domain gaps. In this work, we introduce SAIL-Embedding, an omni-modal embedding foundation model that addresses these issues through tailored training strategies and architectural design. In the optimization procedure, we propose a multi-stage training scheme to boost the multifaceted effectiveness of representation learning. Specifically, the content-aware progressive training aims to enhance the model's adaptability to diverse downstream tasks and master enriched cross-modal proficiency. The collaboration-aware recommendation enhancement training further adapts multimodal representations for recommendation scenarios by distilling knowledge from sequence-to-item and ID-to-item embeddings while mining user historical interests. Concurrently, we develop the stochastic specialization and dataset-driven pattern matching to strengthen model training flexibility and generalizability. Experimental results show that SAIL-Embedding achieves SOTA performance compared to other methods in different retrieval tasks. In online experiments across various real-world scenarios integrated with our model, we observe a significant increase in Lifetime (LT), which is a crucial indicator for the recommendation experience. For instance, the model delivers the 7-day LT gain of +0.5% in the Douyin-Selected scenario. For the Douyin feed rank model, the match features produced by SAIL-Embedding yield a +0.1% AUC gain.

LGAug 28, 2025
Fast Convergence Rates for Subsampled Natural Gradient Algorithms on Quadratic Model Problems

Gil Goldshlager, Jiang Hu, Lin Lin

Subsampled natural gradient descent (SNGD) has shown impressive results for parametric optimization tasks in scientific machine learning, such as neural network wavefunctions and physics-informed neural networks, but it has lacked a theoretical explanation. We address this gap by analyzing the convergence of SNGD and its accelerated variant, SPRING, for idealized parametric optimization problems where the model is linear and the loss function is strongly convex and quadratic. In the special case of a least-squares loss, namely the standard linear least-squares problem, we prove that SNGD is equivalent to a regularized Kaczmarz method while SPRING is equivalent to an accelerated regularized Kaczmarz method. As a result, by leveraging existing analyses we obtain under mild conditions (i) the first fast convergence rate for SNGD, (ii) the first convergence guarantee for SPRING in any setting, and (iii) the first proof that SPRING can accelerate SNGD. In the case of a general strongly convex quadratic loss, we extend the analysis of the regularized Kaczmarz method to obtain a fast convergence rate for SNGD under stronger conditions, providing the first explanation for the effectiveness of SNGD outside of the least-squares setting. Overall, our results illustrate how tools from randomized linear algebra can shed new light on the interplay between subsampling and curvature-aware optimization strategies.

COMP-PHJan 18, 2024
A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions

Gil Goldshlager, Nilin Abrahamsen, Lin Lin

Neural network wavefunctions optimized using the variational Monte Carlo method have been shown to produce highly accurate results for the electronic structure of atoms and small molecules, but the high cost of optimizing such wavefunctions prevents their application to larger systems. We propose the Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to reduce this bottleneck. SPRING combines ideas from the recently introduced minimum-step stochastic reconfiguration optimizer (MinSR) and the classical randomized Kaczmarz method for solving linear least-squares problems. We demonstrate that SPRING outperforms both MinSR and the popular Kronecker-Factored Approximate Curvature method (KFAC) across a number of small atoms and molecules, given that the learning rates of all methods are optimally tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after forty thousand training iterations, whereas both MinSR and KFAC fail to do so even after one hundred thousand iterations.

QUANT-PHMar 30, 2022
Monte Carlo Tree Search based Hybrid Optimization of Variational Quantum Circuits

Jiahao Yao, Haoya Li, Marin Bukov et al.

Variational quantum algorithms stand at the forefront of simulations on near-term and future fault-tolerant quantum devices. While most variational quantum algorithms involve only continuous optimization variables, the representational power of the variational ansatz can sometimes be significantly enhanced by adding certain discrete optimization variables, as is exemplified by the generalized quantum approximate optimization algorithm (QAOA). However, the hybrid discrete-continuous optimization problem in the generalized QAOA poses a challenge to the optimization. We propose a new algorithm called MCTS-QAOA, which combines a Monte Carlo tree search method with an improved natural policy gradient solver to optimize the discrete and continuous variables in the quantum circuit, respectively. We find that MCTS-QAOA has excellent noise-resilience properties and outperforms prior algorithms in challenging instances of the generalized QAOA.

COMP-PHDec 7, 2021
Explicitly antisymmetrized neural network layers for variational Monte Carlo simulation

Jeffmin Lin, Gil Goldshlager, Lin Lin

The combination of neural networks and quantum Monte Carlo methods has arisen as a path forward for highly accurate electronic structure calculations. Previous proposals have combined equivariant neural network layers with an antisymmetric layer to satisfy the antisymmetry requirements of the electronic wavefunction. However, to date it is unclear if one can represent antisymmetric functions of physical interest, and it is difficult to measure the expressiveness of the antisymmetric layer. This work attempts to address this problem by introducing explicitly antisymmetrized universal neural network layers as a diagnostic tool. We first introduce a generic antisymmetric (GA) layer, which we use to replace the entire antisymmetric layer of the highly accurate ansatz known as the FermiNet. We demonstrate that the resulting FermiNet-GA architecture can yield effectively the exact ground state energy for small systems. We then consider a factorized antisymmetric (FA) layer which more directly generalizes the FermiNet by replacing products of determinants with products of antisymmetrized neural networks. Interestingly, the resulting FermiNet-FA architecture does not outperform the FermiNet. This suggests that the sum of products of antisymmetries is a key limiting aspect of the FermiNet architecture. To explore this further, we investigate a slight modification of the FermiNet called the full determinant mode, which replaces each product of determinants with a single combined determinant. The full single-determinant FermiNet closes a large part of the gap between the standard single-determinant FermiNet and FermiNet-GA. Surprisingly, on the nitrogen molecule at a dissociating bond length of 4.0 Bohr, the full single-determinant FermiNet can significantly outperform the standard 64-determinant FermiNet, yielding an energy within 0.4 kcal/mol of the best available computational benchmark.

MLOct 25, 2021
Adaptive Weighted Multi-View Clustering

Shuo Shuo Liu, Lin Lin

Learning multi-view data is an emerging problem in machine learning research, and nonnegative matrix factorization (NMF) is a popular dimensionality-reduction method for integrating information from multiple views. These views often provide not only consensus but also complementary information. However, most multi-view NMF algorithms assign equal weight to each view or tune the weight via line search empirically, which can be infeasible without any prior knowledge of the views or computationally expensive. In this paper, we propose a weighted multi-view NMF (WM-NMF) algorithm. In particular, we aim to address the critical technical gap, which is to learn both view-specific weight and observation-specific reconstruction weight to quantify each view's information content. The introduced weighting scheme can alleviate unnecessary views' adverse effects and enlarge the positive effects of the important views by assigning smaller and larger weights, respectively. Experimental results confirm the effectiveness and advantages of the proposed algorithm in terms of achieving better clustering performance and dealing with the noisy data compared to the existing algorithms.

LGAug 5, 2021
Mixture of Linear Models Co-supervised by Deep Neural Networks

Beomseok Seo, Lin Lin, Jia Li

Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true for every application, in some applications, especially in economics, social science, healthcare industry, and administrative decision making, scientists or practitioners are resistant to use predictions made by a black-box system for multiple reasons. One reason is that a major purpose of a study can be to make discoveries based upon the prediction function, e.g., to reveal the relationships between measurements. Another reason can be that the training dataset is not large enough to make researchers feel completely sure about a purely data-driven result. Being able to examine and interpret the prediction function will enable researchers to connect the result with existing knowledge or gain insights about new directions to explore. Although classic statistical models are much more explainable, their accuracy often falls considerably below DNN. In this paper, we propose an approach to fill the gap between relatively simple explainable models and DNN such that we can more flexibly tune the trade-off between interpretability and accuracy. Our main idea is a mixture of discriminative models that is trained with the guidance from a DNN. Although mixtures of discriminative models have been studied before, our way of generating the mixture is quite different.

MLJan 24, 2021
NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy

Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva et al.

Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-level covariates. We propose NeurT-FDR which boosts statistical power and controls FDR for multiple hypothesis testing while leveraging the hierarchy among test-level covariates. Our method parametrizes the test-level covariates as a neural network and adjusts the feature hierarchy through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets compared to competitive baselines.

QUANT-PHDec 12, 2020
Noise-Robust End-to-End Quantum Control using Deep Autoregressive Policy Networks

Jiahao Yao, Paul Köttering, Hans Gundlach et al.

Variational quantum eigensolvers have recently received increased attention, as they enable the use of quantum computing devices to find solutions to complex problems, such as the ground energy and ground state of strongly-correlated quantum many-body systems. In many applications, it is the optimization of both continuous and discrete parameters that poses a formidable challenge. Using reinforcement learning (RL), we present a hybrid policy gradient algorithm capable of simultaneously optimizing continuous and discrete degrees of freedom in an uncertainty-resilient way. The hybrid policy is modeled by a deep autoregressive neural network to capture causality. We employ the algorithm to prepare the ground state of the nonintegrable quantum Ising model in a unitary process, parametrized by a generalized quantum approximate optimization ansatz: the RL agent solves the discrete combinatorial problem of constructing the optimal sequences of unitaries out of a predefined set and, at the same time, it optimizes the continuous durations for which these unitaries are applied. We demonstrate the noise-robust features of the agent by considering three sources of uncertainty: classical and quantum measurement noise, and errors in the control unitary durations. Our work exhibits the beneficial synergy between reinforcement learning and quantum control.

MLOct 11, 2020
Efficient Long-Range Convolutions for Point Clouds

Yifan Peng, Lin Lin, Lexing Ying et al.

The efficient treatment of long-range interactions for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that directly incorporates long-range information for a point cloud. This layer, dubbed the long-range convolutional (LRC)-layer, leverages the convolutional theorem coupled with the non-uniform Fourier transform. In a nutshell, the LRC-layer mollifies the point cloud to an adequately sized regular grid, computes its Fourier transform, multiplies the result by a set of trainable Fourier multipliers, computes the inverse Fourier transform, and finally interpolates the result back to the point cloud. The resulting global all-to-all convolution operation can be performed in nearly-linear time asymptotically with respect to the number of input points. The LRC-layer is a particularly powerful tool when combined with local convolution as together they offer efficient and seamless treatment of both short and long range interactions. We showcase this framework by introducing a neural network architecture that combines LRC-layers with short-range convolutional layers to accurately learn the energy and force associated with a $N$-body potential. We also exploit the induced two-level decomposition and propose an efficient strategy to train the combined architecture with a reduced number of samples.

QUANT-PHOct 7, 2020
Reinforcement Learning for Many-Body Ground-State Preparation Inspired by Counterdiabatic Driving

Jiahao Yao, Lin Lin, Marin Bukov

The quantum alternating operator ansatz (QAOA) is a prominent example of variational quantum algorithms. We propose a generalized QAOA called CD-QAOA, which is inspired by the counterdiabatic driving procedure, designed for quantum many-body systems and optimized using a reinforcement learning (RL) approach. The resulting hybrid control algorithm proves versatile in preparing the ground state of quantum-chaotic many-body spin chains by minimizing the energy. We show that using terms occurring in the adiabatic gauge potential as generators of additional control unitaries, it is possible to achieve fast high-fidelity many-body control away from the adiabatic regime. While each unitary retains the conventional QAOA-intrinsic continuous control degree of freedom such as the time duration, we consider the order of the multiple available unitaries appearing in the control sequence as an additional discrete optimization problem. Endowing the policy gradient algorithm with an autoregressive deep learning architecture to capture causality, we train the RL agent to construct optimal sequences of unitaries. The algorithm has no access to the quantum state, and we find that the protocol learned on small systems may generalize to larger systems. By scanning a range of protocol durations, we present numerical evidence for a finite quantum speed limit in the nonintegrable mixed-field spin-1/2 Ising and Lipkin-Meshkov-Glick models, and for the suitability to prepare ground states of the spin-1 Heisenberg chain in the long-range and topologically ordered parameter regimes. This work paves the way to incorporate recent success from deep learning for the purpose of quantum many-body control.